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Directed Evolution of Oxidase Enzymes 



This invention is concerned with the production of modified enzymes, particularly 
oxidase enzymes, and more particularly galactose oxidase enzymes. Recombinant techniques 
such as directed evolution are used to obtain polynucleotide and polypeptide products having 
desirable properties. Galactose oxidase variants with increased activity and increased 
thermostability relative to the wild-type enzyme are described. 



An "oxidation enzyme" is an enzyme that catalyzes one or more oxidation reactions, 
typically by adding, inserting, contributing or transferring oxygen from a source or donor to a 
substrate. Such enzymes are also called oxidoreductases or redox enzymes, and encompasses 
oxygenases, hydrogenases or reductases, oxidases and peroxidases. One such enzyme is 
galactose oxidase. This invention relates to the selection and production of polynucleotides that 
encode polypeptides or proteins with biological activity as oxidation enzymes, and in particular 
galactose oxidase enzymes. These enzymes are produced in facile expression systems such as 
robust prokaryotic cells {e.g. bacteria) and eukaryotic systems (e.g. fungi and yeast). 

Field of the Invention 

The invention concerns the recombinant production of functional eukaryotic proteins by 
host cells, in high yield, with increased activity, and/or with increased stability, e.g. 



BACKGROUND OF THE INVENTION 
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thermostability. Preferred proteins of the invention include oxidase enzymes (oxidases) such as 
polypeptides evolved from galactose oxidase (D-galactose:oxygen 6-oxidoreductase or GAO; 
EC 1 . 1 .3 .9). Polynucleotides which encode and express these proteins in recombinant host cell 
expression systems, and the resulting polypeptides, are encompassed by the invention. 

5 

The publications and reference materials noted herein and listed in the appended 
Bibliography are each incorporated by reference in their entirety. They are referenced 
numerically in the text and the Bibliography below. 

10 Production of Enzyme Variants 

Many proteins of interest are produced by organisms having "eukaryotic" cells. These 
are cells having a nucleus surrounded by its own membrane and containing DNA on structures 
called chromosomes. All multicellular organisms, such as humans and animals, and many single- 
cell animals, have eukaryotic cells. Other single-cell organisms, such as bacteria have 
1 5 "prokaryotic" cells. These cells have a primitive nucleus with DNA in a defined structure, but 

without chromosomes and a nuclear membrane that is characteristic of eukaryotes. Prokaryotic 
organisms are generally much easier and less costly to grow, maintain and manipulate than 
eukaryotic cells. 

Genetic engineering and recombinant DNA and RNA technologies have made it possible 
20 to produce proteins, hormones and enzymes that are native to one organism, by using the cells 

of a different organism as "factories" or host cell expression systems. In particular, it is often 
desirable to express a protein of eukaryotic origin in a prokaryotic host cell, because the 
prokaryotes can be grown in large quantities of identical cells, to produce large amounts of the 
desired foreign protein. For example, certain human proteins may be useful as drugs if they can 
25 be supplied in sufficient quantity to patients who have a protein deficiency. Such proteins may 

not easily or ethically be obtained by isolating them from human cells, nor can they easily be 
made by direct chemical synthesis or by growing them in isolated tissue cultures. Other proteins 
and enzymes are useful in industry. For example, certain enzymes can break down food 
products, and are useful in laundry detergent. However, commercial applications require large 
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amounts of protein and a high degree of quality control. Desirable applications also require or 
would benefit from more active or more thermostable (heat resistant) proteins or enzymes. 

To solve some of these problems, recombinant genetic engineering techniques have been 
developed to use genetic machinery of other cells, such as bacteria and yeast, to produce human 
or other proteins. Selected genetic material, such as a polynucleotide that encodes a desired 
protein, is M recombined M with genetic material in a host cell, so that the host cell expresses the 
introduced foreign genetic material and produces the desired polypeptide or protein. Bacteria, 
fungi and and yeast can be suitable host cells because they are easy and economical to grow and 
maintain in large quantities, and can be used to reliably and repeatably produce foreign proteins. 
Some proteins that are made by cells can be secreted or delivered outside the cell, which can 
improve the yield and the efficiency of subsequent isolation and purification steps. 

Directed evolution has been successfully applied to improve a variety of enzyme 
properties, such as substrate specificity, activity in organic solvents, and stability at high 
temperatures, which are often critical for industrial applications (5). This evolutionary approach 
uses DNA shuffling, for simultaneous random mutagenesis and recombination, to generate a 
variant having an improved desirable property over the existing wild type protein. Point 
mutations are generated due to the intrinsic infidelity of Taq-based polymerase chain reactions 
(PCR) associated with reassembly of nucleic acid sequences. In one example, Stemmer and 
coworkers applied this technique to the gene encoding for green fluorescence protein (GFP), 
which resulted in a protein that folded better than the wild type in E. coli (10). Other examples 
are in the literature. (11-18, 21-25, 27-34, 47-58, 60-63, 65-75). Eukaryotic enzymes have a 
myriad of existing and potential applications, but improvement of these and other proteins by 
directed evolution is desirable. For example, the difficulty of expressing certain oxidase enzymes 
in a facile expression host has posed technical challenges. Efforts to modify these enzymes for 
industrial applications by protein engineering methods have been impeded. Directed evolution, 
for example, exploits expression in a host such as E. coli or S. cerevisiae, organisms in which 
large libraries of mutants or variants can be made. Also, the lack of efficient expression in an 
appropriate foreign (heterologous) host can prevent the mass production of some of these 
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proteins on an economical scale. Thus, there continues to be a need for new ways to produce 
new proteins, and for new proteins and enzymes having new or enhanced biological properties. 

Galactose Oxidase Enzymes 

One protein of interest is the oxidation enzyme galactose oxidase. Galactose oxidase (D- 
galactose: oxygen 6-oxidoreductase, GAO; EC 1 . 1 .3.9) is an enzyme containing a single copper 
ion, and is secreted by a number of fungal species. Fusarium NRRL 2903, formerly known as 
Dactylium dendroides^ has been the most extensively studied (76) . The enzyme is a glycoprotein 
with a carbohydrate content of about 1.7% and consists of a single polypeptide chain of 639 
amino acid residues with molecular mass of 68,000 Da (77, 78). The reaction catalyzed by GAO 
is the oxidation of primary alcohols to the corresponding aldehydes, coupled to the two-electron 
reduction of 0 2 to hydrogen peroxide (79). 

The enzyme oxidizes an unusually broad range of substrates. It accepts D-galactose 
(FIG. 1), alpha- and beta-galactopyranosides, oligo- and polysaccharides and considerably 
smaller molecules, such as glycerol and allyl alcohol, as substrates (77, 80-82). GAO exhibits 
prochiral (only the pro-S hydrogen is abstracted) as well as enantiomeric specificity for galactose 
(only D-galactose is oxidized by the enzyme) (80, 83). Furthermore, GAO strictly discriminates 
against D-glucose, the C-4 epimer of D-galactose, as a substrate or ligand. D-glucose does not 
bind to GAO at concentrations as high as 1 M (80, 84). The kinetic parameters of GAO for the 
oxidation of galactose are: K m - 67 mM, k ca t = 3,000 sec" 1 , k C at/K m = 45xl0 3 M^sec 1 (85). 

The crystal structure of GAO has been reported (86). It consists of three predominantly 
beta-structure domains. The copper ion lies on the solvent-accessible surface of the second and 
largest domain (residues 156-532) (78, 87). Tyr-272, Tyr-495, His-496, His-581 and a water 
molecule are the copper ligands at pH 7.0. The crystal structure also reveals a novel thioether 
bond linking Cys-228 and Tyr-272 and supports the presence of a tyrosine free radical at the 
active site (79). The active site structure of GAO is shown in FIG. 2. Site-directed mutagenesis 
of Tyr-495 and Cys-228 have confirmed their involvement in catalysis (85, 88). 

GAO is useful in a wide variety of applications, ranging from analytical and food 
chemistry to chemoenzymatic synthesis and clinical testing. For example, biological sensors 
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based on GAO have been developed to determine the content of galactose (89), lactose and 
other GAO substrates (90). Such biosensors have also been used for quality control in dairy 
industries (91, 92), online bioprocess monitoring (93) and analysis of blood samples of patients 
with suspected galactosemia (94). The stereospecificity and broad substrate specificity of GAO 
5 have been exploited in the chemoenzymatic synthesis of L-sugars from polyols (95), which are 

usually difficult to prepare by chemical methods (96, 97), as well as sugar-containing polyamines 
(98) and 5 -C-(hydroxymethyl)hexoses (99) . GAO applications in synthesis have been limited due 
to its relatively low activity toward a large number of primary alcohols (100). Additionally, 
GAO is also used for the detection of the disaccharide D-galactose-beta-(l->3)-N- 

10 acetylgalactosamine (Gal-GalNAc), a tumor marker in colonic cancer and precancer, and 

provides a cost-effective screening test for patients with neoplasia or at the risk of developing 
neoplasia ( 1 0 1 , 1 02) . GAO finds applications in food chemistry. For example, it has been used 
in oxidized guar manufacture (103) and to treat the oligosaccharide fraction contained in honey 
(104). Finally, GAO is used to oxidize the cell surface polysaccharides of membrane-bound 

1 5 glycoproteins containing terminal non-reducing galactose residues: this is an essential step in the 

successful radiolabeling of these glycoconjugates (105, 106). 

Modified and particularly improved or optimized GAO enzymes are useful to improve 
and expand the use of the enzyme in practical applications. For example, enzymes of the 
invention include GAO variants that are more active, more thermostable, or both. Increased 

20 activity and/or expression as well as high thermostability may significantly decrease the cost of 

enzyme production, simplify its purification and handling, and prolong its shelf-life. Other 
properties of the enzyme may also be varied, for example to optimize activity towards particular 
substrates or toward other substrates such as polymeric materials and glucose. 

Use of these evolved enzymes in biosensors and diagnostics can increase sensitivity, 

25 decrease the response time and enhance the detection range. In addition, a more stable enzyme 

will find applications in the construction of biosensors with prolonged stability. An evolved 
GAO with improved activity toward poor GAO substrates, such as allyl alcohol and glucose, will 
provide new and improved applications of the enzyme in organic synthesis and other sensor 
applications. For chemical synthesis applications, selective oxidation of alcohols to the 
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corresponding aldehydes avoids the use of protecting groups, minimizes side reactions often 
observed in traditional chemical synthesis, and is an environmentally friendly process. Use of 
such GAO enzymes as a synthetic reagent would facilitate the use of more inexpensive, safe and 
biodegradable carbohydrate materials in industrial processes (107). 

A more efficient enzyme is expected to be advantageous in the food chemistry 
applications of GAO, and, in particular in the selective modification of guar and other 
carbohydrate-based polymers. GAO variants according to the invention would also be useful 
for modification of carbohydrate-based (e.g. cellulosic) textiles and other materials. The 
aldehyde function produced by the GAO can be used to couple other substances selectively at 
the modified position on the polymer. 

Accordingly, there is a need to develop new and improved GAO enzymes, as well as 
methods for expressing such proteins. In particular, there is a need for protein expression 
methods which are well-suited for use in connection with directed evolution techniques. 

This invention describes methods for screening libraries of GAO mutants produced by 
error-prone PCR and DNA shuffling, to identify mutations that are expressed in bacteria (e.g. 
E. coli) and with improved GAO function. Micro-plate and membrane screening techniques are 
disclosed. In one embodiment, the mutant is a functional and active galactose oxidase (GAO) 
that is expressed in K coli at levels of about 65 times the activity of a parent recombinant wild 
type (for D-galactose). The activity for other substrates, such as allyl alcohol, is also about 65 
times that of wild type. Mutants of the invention can have any fraction or multiple of the 
corresponding wild type activity, but preferably are more active, e.g. about 2 to 200 times as 
active. Mutants also are more thermostable. Enzyme yield is generally at least about 10 mg/1. 

SUMMARY OF THE INVENTION 

The observed constraints on the use of native proteins are thought to be a consequence 
of evolution. Proteins have evolved in the context and environment of a living organism, to carry 
out specific biological functions under conditions conducive to life - not in the laboratory or 
under industrial conditions. In some cases, evolution may favor or even require less than 
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optimally efficient enzymes. The output, efficiency, working conditions, stability and other 
properties of known expression systems are not thought to be unalterable, nor are they 
limitations which should be seen as intrinsic to the nature of cellular expression systems. It is 
possible that the proteins used in these systems can be evolved in vitro, or that analogous 
proteins can be otherwise developed, to alter or enhance the protein's properties, for example, 
to obtain much more efficient expression, activity and thermostability. Improved proteins can 
also be obtained by screening cultures of native organisms or expressed gene libraries (3). 

The invention provides a method for improving the expression, thermostability, and/or 
the activity toward one or more substrates, of a polynucleotide encoding oxidase enzymes by 
using directed evolution. The invention also provides polynucleotides encoding for variant 
oxidase enzymes which have improved properties in conventional expression systems. 
According to one embodiment of the invention, directed evolution or random mutagenesis is 
used to produce GAO variants which are more highly expressed, more active, and/or more 
thermostable in prokaryotic expression systems such as E. coli. 

The above features and many other attendant advantages of the invention will become 
better understood by reference to the following detailed description when taken in conjunction 
with the accompanying drawings. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 shows a reaction scheme in which a D-galactose substrate is oxidized to produce 
a D-galactohexodialdose product, in the presence of galactose oxidase (GAO) enzyme. 
FIG, 2 shows the active site structure of GAO pH 7.0 

FIG, 3 is a graph showing the effect of metal ions (particularly copper ions) on the 
activity of a recombinant wild-type GAO, pGAO-010. Enzyme solutions with additives were 
kept at 4 °C for 1 hr before assay. Relative activity of enzyme solution with 1 mM copper (II) 
sulfate was estimated as 100 %. 
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FIG, 4 is a graph showing GAO activity for various clones generated by error-prone 
PCR, with varying concentrations of MnCl 2 , using conditions A of TABLE 3. 

FIG* 5 is a graph showing GAO activity for various clones generated by error-prone 
PCR, with varying concentrations of MnCl 2 using conditions C of TABLE 3. 

FIG* 6 shows the sequences of PCR primers used herein for amplification, e.g. of the 
whole galactose oxidase gene. 

FIG* 7 is a schematic representation of the construction of plasmid pUC18-EHL. 

FIG. 8 is a schematic representation of the construction of plasmid pGAO-010. 

FIG* 9 is a schematic representation of the construction of plasmids pGAO-027 and 
pGAO-036. 

FIG* 10 is a schematic representation of the construction of plasmids pGAO-006 and 
pGAO-011. 

FIG* 11 shows the structures and activities of representative plasmids encoding GAO 
according to the invention, with IPTG-induced expression in host E. coll Permeable cells which 
were treated by freeze (-20 °C), thaw (4 °C) and 0.5 mg/1 lysozyme for 30 minutes at 37 °C were 
used for assay. Activities given as * indicates that cells did not grow in test tube culture; 
** indicates that a transformant was not obtained. 

FIG* 12 shows a scheme for the design of plasmids according to the invention. 

FIG. 13 shows the structures and activities of additional plasmids encoding GAO 
according to the invention, with IPTG-induced expression in host E. coll 

FIG* 14 is a graph comparing the GAO activities of GAO plasmids with and without 
random codon alternation. 

FIG* 1 5 shows substrate specificities for a wild type galactose oxidase and a recombinant 
galactose oxidase enzyme of the invention. Partially purified galactose oxidase from D. 
dendroides (Sigma) and cell-free extract from E. coli BL21(DE3)/pGAO-010 were used. 
Relative activities for D-galactose were estimated as 100 %, (+) indicates that oxidation was 
detected, but activities were too low to be estimated, n.d. indicates that activities were not 
distinguishable from background absorbance levels. 



CIT-3183 



ATTORNEY DOCKET 1G811-US1 



FIG. 16 is a graph showing the thermal stability of selected GAO mutants. 
FIGS. 17A-C show the sequence of representative mutant 9. 16.8D2 of the invention 
[SEQ. ID NO. 10] 

FIGS. 18A-C show the sequence of representative mutant 9. 16.6C1 1 of the invention 
5 [SEQ.IDNO. 11] 

FIGS. 19A-C show the sequence of representative mutant 9. 1 6. 1 6D 12 of the invention 
[SEQ. ID NO. 12] 

FIGS. 20A-C show the sequence of representative mutant 1 1.03.6D3 of the invention 
[SEQ. ID NO. 13] 

10 FIGS. 21A-C show the sequence of representative mutant 1 1 .03 . 10C3 of the invention 

[SEQ. ID NO. 14] 

FIGS. 22A-C show the sequence of representative mutant 1 1 .03 . 10D6 of the invention 
[SEQ. ID NO. 15] 

FIGS. 23A-C showthe sequence of representative mutant 1 L03.13E12 of the invention 
15 [SEQ. ID NO. 16] 

FIGS. 24A-C show the sequence of representative mutant L06.20E7 of the invention 
[SEQ. ID NO. 17] 

FIGS. 25A-C show the sequence of representative mutant 1 ,D4 of the invention [SEQ. 
ID NO. 18] 

20 FIGS. 26A-C show the sequence of representative mutant 2G4 of the invention [SEQ. 

ID NO. 19] 

FIGS. 27A-C show the sequence of representative mutant 3 ,H7 of the invention [SEQ. 
ID NO. 20] 

FIGS. 28 A-C show the sequence of representative mutant 4.F 1 2 of the invention [SEQ. 
25 ID NO. 21] 
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DETAILED DESCRIPTION OF THE INVENTION 

This invention concerns methods for improving the expression, activity and/or 
thermostability of proteins using facile or conventional expression systems. 

Definitions 

As used herein, "about" or "approximately" shall mean within 20 percent, preferably 
within 10 percent, and more preferably within 5 percent of a given value or range. 

The term "substrate" means any substance or compound that is converted or meant to 
be converted into another compound by the action of an enzyme catalyst. The term includes 
aromatic and aliphatic compounds, and includes not only a single compound, but also 
combinations of compounds, such as solutions, mixtures and other materials which contain at 
least one substrate. 

An "oxidation reaction" or "oxygenation reaction", as used herein, is a chemical or 
biochemical reaction involving the addition of oxygen to a substrate, to form an oxygenated or 
oxidized substrate or product. An oxidation reaction is typically accompanied by a reduction 
reaction (hence the term "redox" reaction, for oxidation and reduction). A compound is 
"oxidized" when it receives oxygen or loses electrons. A compound is "reduced" when it loses 
oxygen or gains electrons. GAO typically catalyzes the oxidation of a primary alcohol group 
to an aldehyde. 

The term "enzyme" means any substance composed wholly or largely of protein or 
polypeptides that catalyzes or promotes, more or less specifically, one or more chemical or 
biochemical reactions. 

A "polypeptide" (one or more peptides) is a chain of chemical building blocks called 
amino acids that are linked together by chemical bonds called peptide bonds. A protein or 
polypeptide, including an enzyme, may be "native" or "wild-type", meaning that it occurs in 
nature or has the amino acid sequence of a native protein, respectively. These terms are 
sometimes used interchangeably. A polypeptide may or may not be glycosylated. A 
"recombinant wild-type" typically means the wild type sequence in a recombinant host without 
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glycosylation. Comparisons in the examples and figures of this application are generally with 
reference to a wild type that is a recombinant wild type. A polypeptide may also be a "mutant", 
"variant" or "modified", meaning that it has been made, altered, derived, or is in some way 
different or changed from a native protein, or from another mutant. A native wild type protein 
5 comprises the natural sequence of amino acids in the polypeptide and typically includes 

glycosylation. A "parent" polypeptide or enzyme is any polypeptide or enzyme from which any 
other polypeptide or enzyme is derived or made, using any methods, tools or techniques, and 
whether or not the parent is itself a native or mutant polypeptide or enzyme. A parent 
polynucleotide is one that encodes a parent polypeptide. A "test enzyme" is a protein-containing 

1 0 substance that is tested to determine whether it has properties of an enzyme. The term "enzyme" 

can also refer to a catalytic polynucleotide (e.g. RNA or DNA). 

The "activity" of an enzyme is a measure of its ability to catalyze a reaction, and may be 
expressed as the rate at which the product of the reaction is produced. For example, enzyme 
activity can be represented as the amount of product produced per unit of time, per unit (e.g. 

15 concentration or weight) of enzyme. The "stability" of an enzyme means its ability to function, 

over time, in a particular environment or under particular conditions. One way to evaluate 
stability is to assess its ability to resist a loss of activity over time, under given conditions. 
Enzyme stability can also be evaluated in other ways, for example, by determining the relative 
degree to which the enzyme is in a folded or unfolded state. Thus, one enzyme is more stable 

20 than another, or has improved stability, when it is more resistant than the other enzyme to a loss 

of activity under the same conditions, is more resistant to unfolding, or is more durable by any 
suitable measure. For example, a more "thermally stable" or "thermostable" enzyme is one that 
is more resistant to loss of structure (unfolding) or function (enzyme activity) when exposed to 
heat or an elevated temperature. One way to evaluate this is to determine the "melting 

25 temperature" or T m for the protein. The melting temperature, also called a midpoint, is the 

temperature at which half of the protein is unfolded from its fully folded state. This midpoint is 
typically determined by calculating the midpoint of a titration curve that plots protein unfolding 
as a function of temperature. Thus, a protein with a higher T m requires more heat to cause 
unfolding and is more stable or more thermostable. Stated another way, a protein with a higher 
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T m indicates that fewer molecules of that protein are unfolded at the same temperature as a 
protein with a lower T m , again meaning that the protein which is more resistant to unfolding is 
more stable (it has less unfolding at the same temperature). Another measure of stability is T 1/2 
or T 50 , which is the transition midpoint of the inactivation curve of the protein as a function of 
temperature. T 1/2 is the temperature at which the protein loses half of its activity. Thus, a 
protein with a higher T 1/2 requires more heat to deactivate it, and is more stable or more 
thermostable. Stated another way, a protein with a higher T 1/2 indicates that fewer molecules of 
that protein are inactive at the same temperature as a protein with a lower T 1/2 , again meaning 
that the protein which is more resistant to deactivation is more stable (it has more activity at the 
same temperature). These assays are also called "thermal shift" assays, because the inactivation 
or unfolding curve, plotted against temperature, is "shifted" to higher or lower temperatures 
when stability increases or decreases. Thermostability can also be measured in other ways. For 
example, a longer half-life (t 1/2 ) for the enzyme's activity at elevated temperature is an indication 
of thermostability. 

An "oxidation enzyme" is an enzyme that catalyzes one or more oxidation reactions, 
typically by adding, inserting, contributing or transferring oxygen from a source or donor to a 
substrate. Such enzymes are also called oxidoreductases or redox enzymes, and encompasses 
oxygenases, hydrogenases or reductases, oxidases and peroxidases. 

The terms "oxygen donor", "oxidizing agent" and "oxidant" mean a substance, molecule 
or compound which donates oxygen to a substrate in an oxidation reaction. Typically, the 
oxygen donor is reduced (accepts electrons). Exemplary oxygen donors, which are not limiting, 
include molecular oxygen or dioxygen (0 2 ) and peroxides, including alkyl peroxides such as t- 
butyl peroxide, and most preferably hydrogen peroxide (H 2 0 2 ). A peroxide is any compound 
having two oxygen atoms bound to each other. 

A "luminescent" substance means any substance which produces detectable 
electromagnetic radiation, or a change in electromagnetic radiation, most notably visible light, 
by any mechanism, including color change, UV absorbance, fluorescence and phosphorescence. 
Preferably, a luminescent substance according to the invention produces a detectable color, 
fluorescence or UV absorbance. The term "chemiluminescent agent" means any luminescent 
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substance which enhances the detectability of a luminescent (e.g., fluorescent) signal, for example 
by increasing the strength or lifetime of the signal One exemplary and preferred 
chemiluminescent agent is azinobis(3-ethylbenzothiazoline-6-sulfonic acid) (ABTS). 5-amino- 

2.3- dihydro-l,4-phthalazinedione (luminol) and analogs. Others include 5-amino-2,3-dihydro- 

1.4- phthalazinedione (luminol) and analogs, 1,2-dioxetanes such as tetramethyl-l,2-dioxetane 
(TMD), 1,2-dioxetanones, and 1,2-dioxetanediones, oanisidine, o-dianisidine, and o-tolidine. 
Another term for these kinds of materials is "chromogen " 

The term "polymer" means any substance or compound that is composed of two or more 
building blocks ('mers') that are repetitively linked to each other. For example, a "dimer" is a 
compound in which two building blocks have been joined together. 

The term "cofactor" means any non-protein substance that is necessary or beneficial to 
the activity of an enzyme. A "coenzyme" means a cofactor that interacts directly with and serves 
to promote a reaction catalyzed by an enzyme. Many coenzymes serve as carriers. For example, 
NAD* and NADP* carry hydrogen atoms from one enzyme to another. An "ancillary protein" 
means any protein substance that is necessary or beneficial to the activity of an enzyme. 

The term "host cell" means any cell of any organism that is selected, modified, 
transformed, grown, or used or manipulated in any way, for the production of a substance by 
the cell, for example the expression by the cell of a gene, a DNA or RNA sequence, a protein or 
an enzyme. 

"DNA" (deoxyribonucleic acid) means any chain or sequence of the chemical building 
blocks adenine (A), guanine (G), cytosine (C) and thymine (T), called nucleotide bases, that are 
linked together on a deoxyribose sugar backbone. DNA can have one strand of nucleotide bases, 
or two complimentary strands which may form a double helix structure. "RNA" (ribonucleic 
acid) means any chain or sequence of the chemical building blocks adenine (A), guanine (G), 
cytosine (C) and uracil (U), called nucleotide bases, that are linked together on a ribose sugar 
backbone. RNA typically has one strand of nucleotide bases. 

A "polynucleotide" or "nucleotide sequence" is a series of nucleotide bases (also called 
"nucleotides") in DNA and RNA, and means any chain of two or more nucleotides. A nucleotide 
sequence typically carries genetic information, including the information used by cellular 



CIT-3183 



- 14- 



ATTORNEY DOCKET 1G811-US1 



machinery to make proteins and enzymes. These terms include double or single stranded 
genomic and cDN A, RNA, any synthetic and genetically manipulated polynucleotide, and both 
sense and anti-sense polynucleotide (although only sense stands are being represented herein). 
This includes single- and double-stranded molecules, i.e., DNA-DNA, DNA-RNA and RNA- 
RNA hybrids, as well as "protein nucleic acids" (PNA) formed by conjugating bases to an amino 
acid backbone. This also includes nucleic acids containing modified bases, for example thio- 
uracil, thio-guanine and fluoro-uracil. 

The polynucleotides herein may be flanked by natural regulatory sequences, or may be 
associated with heterologous sequences, including promoters, enhancers, response elements, 
signal sequences, polyadenylation sequences, introns, 5- and 3 - non-coding regions, and the like. 
The nucleic acids may also be modified by many means known in the art. Non-limiting examples 
of such modifications include methylation, "caps", substitution of one or more of the naturally 
occurring nucleotides with an analog, and internucleotide modifications such as, for example, 
those with uncharged linkages (e.g., methyl phosphonates, phosphotriesters, phosphoroamidates, 
carbamates, etc.) and with charged linkages (e.g., phosphorothioates, phosphorodithioates, etc.). 
Polynucleotides may contain one or more additional covalently linked moieties, such as, for 
example, proteins (e.g., nucleases, toxins, antibodies, signal peptides, poly-L-lysine, etc.), 
intercalators (e.g., acridine, psoralen, etc.), chelators (e.g., metals, radioactive metals, iron, 
oxidative metals, etc.), and alkylators. The polynucleotides maybe derivatized by formation of 
a methyl or ethyl phosphotriester or an alkyl phosphoramidate linkage. Furthermore, the 
polynucleotides herein may also be modified with a label capable of providing a detectable signal, 
either directly or indirectly. Exemplary labels include radioisotopes, fluorescent molecules, 
biotin, and the like. 

Proteins and enzymes are made in the host cell using instructions in DNA and RNA, 
according to the genetic code. Generally, a DNA sequence having instructions for a particular 
protein or enzyme is "transcribed" into a corresponding sequence of RNA. The RNA sequence 
in turn is "translated" into the sequence of amino acids which form the protein or enzyme. An 
"amino acid sequence" is any chain of two or more amino acids. Each amino acid is represented 
in DNA or RNA by one or more triplets of nucleotides. Each triplet forms a codon, 
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corresponding to an amino acid. For example, the amino acid lysine (Lys) can be coded by the 
nucleotide triplet or codon AAA or by the codon AAG. (The genetic code has some 
redundancy, also called degeneracy, meaning that most amino acids have more than one 
corresponding codon.) Because the nucleotides in DNA and RNA sequences are read in groups 
5 of three for protein production, it is important to begin reading the sequence at the correct amino 

acid, so that the correct triplets are read. The way that a nucleotide sequence is grouped into 
codons is called the "reading frame." 

The term "gene", also called a "structural gene" means a DNA sequence that codes for 
or corresponds to a particular sequence of amino acids which comprise all or part of one or more 

1 0 proteins or enzymes, and may or may not include regulatory DNA sequences, such as promoter 

sequences, which determine for example the conditions under which the gene is expressed. 
Some genes, which are not structural genes, may be transcribed from DNA to RNA, but are not 
translated into an amino acid sequence. Other genes may function as regulators of structural 
genes or as regulators of DNA transcription. 

15 A "coding sequence" or a sequence "encoding" a polypeptide, protein or enzyme is a 

nucleotide sequence that, when expressed, results in the production of that polypeptide, protein 
or enzyme, i.e., the nucleotide sequence encodes an amino acid sequence for that polypeptide, 
protein or enzyme. A coding sequence is "under the control" of transcriptional and translational 
control sequences in a cell when RNA polymerase transcribes the coding sequence into mRNA, 

20 which is then trans-RNA spliced and translated into the protein encoded by the coding sequence. 

Preferably, the coding sequence is a double-stranded DNA sequence which is transcribed and 
translated into a polypeptide in a cell in vitro or in vivo when placed under the control of 
appropriate regulatory sequences. The boundaries of the coding sequence are determined by a 
start codon at the 5' (amino) terminus and a translation stop codon at the 3* (carboxyl) terminus. 

25 A coding sequence can include, but is not limited to, prokaryotic sequences, cDNA from 

eukaryotic mRNA, genomic DNA sequences from eukaryotic (e.g. , mammalian) DNA, and even 
synthetic DNA sequences. If the coding sequence is intended for expression in a eukaryotic cell, 
a polyadenylation signal and transcription termination sequence will usually be located 3' to the 
coding sequence. 
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Transcriptional and translational control sequences are DN A regulatory sequences, such 
as promoters, enhancers, terminators, and the like, that provide for the expression of a coding 
sequence in a host cell In eukaryotic cells, polyadenylation signals are control sequences. 

A "promoter sequence" is a DNA regulatory region capable of binding RNA polymerase 

5 in a cell and initiating transcription of a downstream (3' direction) coding sequence. For 

purposes of defining this invention, the promoter sequence is bounded at its 3' terminus by the 
transcription initiation site and extends upstream (5' direction) to include the minimum number 
of bases or elements necessary to initiate transcription at levels detectable above background. 
Within the promoter sequence will be found a transcription initiation site (conveniently defined 

10 for example, by mapping with nuclease SI), as well as protein binding domains (consensus 

sequences) responsible for the binding of RNA polymerase. As described above, promoter DNA 
is a DNA sequence which initiates, regulates, or otherwise mediates or controls the expression 
of the coding DNA. A promoter may be "inducible", meaning that it is influenced by the 
presence or amount of another compound (an "inducer"). For example, an inducible promoter 

1 5 includes those which initiate or increase the expression of a downstream coding sequence in the 

presence of a particular inducer compound. A "leaky" inducible promoter is a promoter that 
provides a high expression level in the presence of an inducer compound and a comparatively 
very low expression level, and at minimum a detectable expression level, in the absence of the 
inducer. 

20 A "signal sequence" is included at the beginning of the coding sequence of a protein to 

be expressed in the periplasmic space, or outside the cell. This sequence encodes a signal 
peptide, N-terminal to the mature polypeptide, that directs the host cell to translocate the 
polypeptide. The term "translocation signal sequence" is also used to refer to a signal sequence. 
Translocation signal sequences can be found associated with a variety of proteins native to 

25 eukaryotes and prokaryotes, and are often functional in both types of organisms. Proteins of the 

invention may be further modified and improved by adding a sequence which directs the 
secretion of the protein outside the host cell The addition of the signal sequence does not 
interfere with the folding of the secreted protein, and evidence thereof is easily tested for using 
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techniques known in the art and depending on the protein (e.g., tests for activity of a given 
protein after modification). 

The terms "express" and "expression" mean allowing or causing the information in a gene 
or DNA sequence to become manifest, for example producing a protein by activating the cellular 

5 functions involved in transcription and translation of a corresponding gene or DNA sequence. 

A DNA sequence is expressed in or by a cell to form an "expression product" such as a protein. 
The expression product itself, e.g. the resulting protein, may also be said to be "expressed" by 
the cell A polynucleotide or polypeptide is expressed recombinantly, for example, when it is 
expressed or produced in a foreign host cell under the control of a foreign or native promoter, 

10 or in a native host cell under the control of a foreign promoter. 

A polynucleotide or polypeptide is "over-expressed" when it is expressed or produced 
in an amount or yield that is substantially higher than a given base-line yield, e.g. a yield that 
occurs in nature. For example, a polypeptide is over-expressed when the yield is substantially 
greater than the normal, average or base-line yield of the native polypolypeptide in native host 

1 5 cells under given conditions, for example conditions suitable to the life cycle of the native host 

cells. Over-expression of a polypeptide can be obtained, for example, by altering any one or 
more of: (a) the growth or living conditions of the host cells; (b) the polynucleotide encoding the 
polypeptide to be over-expressed; (c) the promoter used to control expression of the 
polynucleotide; and (d) the host cells themselves. This is a relative, and thus "over-expression" 

20 can also be used to compare or distinguish the expression level of one polypeptide to another, 

without regard for whether either polypeptide is a native polypeptide or is encoded by a native 
polynucleotide. Typically, over-expression means a yield that is at least about two times a 
normal, average or given base-line yield. Thus, a polypeptide is over-expressed when it is 
produced in an amount or yield that is substantially higher than the amount or yield of a parent 

25 polypeptide or under parent conditions. Likewise, a polypeptide is "under-expressed" when it 

is produced in an amount or yield that is substantially lower than the amount or yield of a parent 
polypeptide or under parent conditions, e.g. at least half the base-line yield. In this context, the 
expression level or yield refers to the amount or concentration of polynucleotide that is 
expressed, or polypeptide that is produced {i.e. expression product), whether or not in an active 
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or functional form. As one example, a polynucleotide or polypeptide may be said to be under- 
expressed when it is expressed in detectable amounts under the control of an inducible promoter, 
but without induction, i.e. in the absence of an inducer compound. 

An expression product can be characterized as intracellular, extracellular or secreted. The 
term "intracellular" means something that is inside a cell. The term "extracellular" means 
something that is outside a cell A substance is "secreted" by a cell if it delivered to the 
periplasm or outside the cell, from somewhere on or inside the cell. 

As used herein, the terms "expression-resistant polypeptide" and "resistant to functional 
expression" are synonymous and refer to a polypeptide that is difficult to functionally express 
in selected host cells. For example, an expression-resistant polypeptide is not produced, or is 
produced in very low yield or in non-functional form, when a polynucleotide encoding that 
polypeptide is transformed or introduced into host cells, e.g. into a facile host cell expression 
system. 

The term "transformation" means the introduction of a "foreign" (i.e. extrinsic or 
extracellular) gene, DNA or RNA sequence to a host cell, so that the host cell will express the 
introduced gene or sequence to produce a desired substance, typically a protein or enzyme coded 
by the introduced gene or sequence. The introduced gene or sequence may also be called a 
"cloned" or "foreign" gene or sequence, may include regulatory or control sequences, such as 
start, stop, promoter, signal, secretion, or other sequences used by a cell's genetic machinery. 
The gene or sequence may include nonfunctional sequences or sequences with no known 
function. A host cell that receives and expresses introduced DNA or RNA has been 
"transformed" and is a "transformant" or a "clone. " The DNA or RNA introduced to a host cell 
can come from any source, including cells of the same genus or species as the host cell, or cells 
of a different genus or species. 

The terms "vector", "cloning vector" and "expression vector" mean the vehicle by which 
a DNA or RNA sequence (e.g. a foreign gene) can be introduced into a host cell, so as to 
transform the host and promote expression (e.g. transcription and translation) of the introduced 
sequence. 
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Vectors typically comprise the DNA of a transmissible agent, into which foreign DNA 
is inserted. A common way to insert one segment of DNA into another segment of DNA 
involves the use of enzymes called restriction enzymes that cleave DNA at specific sites (specific 
groups of nucleotides) called restriction sites. Generally, foreign DNA is inserted at one or more 

5 restriction sites of the vector DNA, and then is carried by the vector into a host cell along with 

the transmissible vector DNA. A segment or sequence of DNA having inserted or added DNA, 
such as an expression vector, can also be called a "DNA construct." 

A common type of vector is a "plasmid", which generally is a self-contained molecule 
of double-stranded DNA, that can readily accept additional (foreign) DNA and which can readily 

1 0 introduced into a suitable host cell. A plasmid vector often contains coding DNA and promoter 

DNA and has one or more restriction sites suitable for inserting foreign DNA. Promoter DNA 
and coding DNA may be from the same gene or from different genes, and may be from the same 
or different organisms. A large number of vectors, including plasmid and fungal vectors, have 
been described for replication and/or expression in a variety of eukaryotic and prokaryotic hosts. 

15 Non-limiting examples include pKK plasmids (Clonetech), pUC plasmids, pET plasmids 

(Novagen, Inc., Madison, WI), pRSET or pREP plasmids (Invitrogen, San Diego, CA), or 
pMAL plasmids (New England Biolabs, Beverly, MA), and many appropriate host cells, using 
methods disclosed or cited herein or otherwise known to those skilled in the relevant art. 
Recombinant cloning vectors will often include one or more replication systems for cloning or 

20 expression, one or more markers for selection in the host, e.g. antibiotic resistance, and one or 

more expression cassettes. Routine experimentation in biotechnology can be used to determine 
which vectors are best suited for used with the invention. In general, the choice of vector 
depends on the size of the polynucleotide sequence and the host cell to be employed in the 
methods of this invention. 

25 A "cassette" refers to a segment of DNA that can be inserted into a vector at specific 

restriction sites. The segment of DNA encodes a polypeptide of interest, and the cassette and 
restriction sites are designed to ensure insertion of the cassette in the proper reading frame for 
transcription and translation. 
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The term "expression system" means a host cell and compatible vector under suitable 
conditions, e.g. for the expression of a protein coded for by foreign DNA carried by the vector 
and introduced to the host cell. Common expression systems include bacteria (e.g. E. coli and 
B. subtilis) or yeast (e.g. S. cerevisiae) host cells and plasmid vectors, and insect host cells and 
5 Baculovirus vectors. As used herein, a "facile expression system" means any expression system 

that is foreign or heterologous to a selected polynucleotide or polypeptide, and which employs 
host cells that can be grown or maintained more advantageously than cells that are native or 
heterologous to the selected polynucleotide or polypeptide, or which can produce the 
polypeptide more efficiently or in higher yield. For example, the use of robust prokaryotic cells 
10 to express a protein of eukaryotic origin would be a facile expression system. Preferred facile 

expression systems included. coli,B. subtilismdS. cerevisiae host cells and any suitable vector. 

The terms "mutant" and "mutation" mean any detectable change in genetic material, e.g. 
DNA, or any process, mechanism, or result of such a change. This includes gene mutations, in 
which the structure (e.g. DNA sequence) of a gene is altered, any gene or DNA arising from any 
1 5 mutation process, and any expression product (e.g. protein or enzyme) expressed by a modified 

gene or DNA sequence. The term "variant" may also be used to indicate a modified or altered 
gene, DNA sequence, enzyme, cell, etc., i.e., any kind of mutant. Such changes also include 
changes in the promoter, ribosome binding site, etc. 

"Sequence-conservative variants" of a polynucleotide sequence are those in which a 
20 change of one or more nucleotides in a given codon position results in no alteration in the amino 

acid encoded at that position. 

"Function-conservative variants" are those in which a given amino acid residue in a 
protein or enzyme has been changed without altering the overall conformation and function of 
the polypeptide, including, but not limited to, replacement of an amino acid with one having 
25 similar properties (such as, for example, acidic, basic, hydrophobic, and the like). Amino acids 

with similar properties are well known in the art. For example, arginine, histidine and lysine are 
hydrophilic-basic amino acids and may be interchangeable. Similarly, isoleucine, a hydrophobic 
amino acid, may be replaced with leucine, methionine or valine. Amino acids other than those 
indicated as conserved may differ in a protein or enzyme so that the percent protein or amino 
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acid sequence similarity between any two proteins of similar function may vary and may be, for 
example, from 70% to 99% as determined according to an alignment scheme such as by the 
Cluster Method, wherein similarity is based on the MEGALIGN algorithm. A "function- 
conservative variant" also includes a polypeptide or enzyme which has at least 60 % amino acid 
5 identity as determined by BLAST or FAST A algorithms, preferably at least 75%, most preferably 

at least 85%, and even more preferably at least 90%, and which has the same or substantially 
similar properties or functions as the native or parent protein or enzyme to which it is compared. 

The term "DNA reassembly" is used when recombination occurs between identical 
sequences. "DNA shuffling" refers to a group of in vitro or in vivo methods involving 
10 recombination of nucleic acid species. For example, homologous recombination of pools of 

nucleic acid fragments or polynucleotides can be employed to generate polynucleotide molecules 
having variant sequences of the invention. Such methods can be employed to generate 
polynucleotide molecules having variant sequences of the invention. 

"Isolation" or "purification" of a polypeptide or enzyme refers to the derivation of the 
15 polypeptide by removing it from its original environment (for example, from its natural 

environment if it is naturally occurring, or form the host cell if it is produced by recombinant 
DNA methods). Methods for polypeptide purification are well-known in the art, including, 
without limitation, preparative disc-gel electrophoresis, isoelectric focusing, HPLC, reversed- 
phase HPLC, gel filtration, ion exchange and partition chromatography, and countercurrent 
20 distribution. For some purposes, it is preferable to produce the polypeptide in a recombinant 

system in which the protein contains an additional sequence tag that facilitates purification, such 
as, but not limited to, a polyhistidine sequence. The polypeptide can then be purified from a 
crude lysate of the host cell by chromatography on an appropriate solid-phase matrix. 
Alternatively, antibodies produced against the protein or against peptides derived therefrom can 
25 be used as purification reagents. Other purification methods are possible. A purified 

polynucleotide or polypeptide may contain less than about 50%, preferably less than about 75%, 
and most preferably less than about 90%, of the cellular components with which it was originally 
associated. A "substantially pure" enzyme indicates the highest degree of purity which can be 
achieved using conventional purification techniques known in the art. 
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Polynucleotides are "hybridizable" to each other when at least one strand of one 
polynucleotide can anneal to another polynucleotide under defined stringency conditions. 
Stringency of hybridization is determined, e.g., by a) the temperature at which hybridization 
and/or washing is performed, and b) the ionic strength and polarity (e.g., formamide) of the 

5 hybridization and washing solutions, as well as other parameters. Hybridization requires that the 

two polynucleotides contain substantially complementary sequences; depending on the stringency 
of hybridization, however, mismatches may be tolerated. Typically, hybridization of two 
sequences at high stringency (such as, for example, in an aqueous solution of 0.5X SSC at 65°C) 
requires that the sequences exhibit some high degree of complementarity over their entire 

1 0 sequence. Conditions of intermediate stringency (such as, for example, an aqueous solution of 

2X SSC at 65°C) and low stringency (such as, for example, an aqueous solution of 2X SSC at 
55°C), require correspondingly less overall complementarity between the hybridizing sequences. 
(IX SSC is 0.15 M NaCl, 0.015 M Na citrate.) Polynucleotides that "hybridize" to the 
polynucleotides herein may be of any length. In one embodiment, such polynucleotides are at 

15 least 10, preferably at least 15 and most preferably at least 20 nucleotides long. In another 

embodiment, polynucleotides that hybridizes are of about the same length. In another 
embodiment, polynucleotides that hybridize include those which anneal under suitable stringency 
conditions and which encode polypeptides or enzymes having the same function, such as the 
ability to catalyze an oxidation, oxygenase, or coupling reaction of the invention. 

20 The general genetic engineering tools and techniques discussed here, including 

transformation and expression, the use of host cells, vectors, expression systems, etc., are well 
known in the art. 

Mutagenesis and Directed Evolution of Proteins 

25 To improve the expression and function of proteins using conventional expression 

systems, the invention makes the unexpected discovery that directed evolution can be used to 
generate mutant libraries of polynucleotides which, when expressed using conventional or facile 
expression systems, result in functional proteins having increased activity and/or thermostability. 
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According to the invention, proteins that are expressed in facile gene expression systems 
can be obtained by using directed evolution to generate mutant polynucleotides in a library 
format for selection. General methods for generating libraries and isolating and identifying 
improved proteins (also described as "variants") according to the invention using directed 

5 evolution are described briefly below and more extensively, for example, in U.S. Patent Nos. 

5,741,691 and 5,811,238. See also, International Applications WO 98/42832, WO 95/22625, 
WO 97/20078, and WO 95/ and U.S. Patents 5,605,793 and 5,830,721 (143, 149-156). It 
should be understood that any method for generating mutations in polynucleotide sequences to 
provide an evolved polynucleotide for use in expression systems can be employed. Proteins 

1 0 produced by directed evolution methods can then be screened for improved expression, activity, 

thermostability, folding, secretion, and other functions and properties according to conventional 
methods. 

Any source of nucleic acid in purified form can be utilized as the starting nucleic acid. 
Thus the process may employ DNA or RNA including messenger RNA, which DNA or RNA 

1 5 may be single or double stranded. In addition, a DNA-RNA hybrid which contains one strand 

of each may be utilized. The nucleic acid sequence may be of various lengths depending on the 
size of the nucleic acid sequence to be mutated. Preferably the specific nucleic acid sequence is 
from 50 to 50,000 base pairs. It is contemplated that entire vectors containing the nucleic acid 
encoding the protein of interest may be used in the methods of this invention. 

20 Any specific nucleic acid sequence can be used to produce the population of mutants by 

the present process. An initial population of the specific nucleic acid sequences having mutations 
may be created by a number of different known methods, some of which are set forth below. 

Error-prone polymerase chain reaction (20,45,46) and cassette mutagenesis (38-44), in 
which the specific region optimized is replaced with a synthetically mutagenized oligonucleotide 

25 can be employed in the invention. Error-prone PCR can be used to mutagenize a mixture of 

fragments of unknown sequences. These techniques can also be employed under low-fidelity 
polymerization conditions to introduce a low level of point mutations randomly over a long 
sequence, or to mutagenize a mixture of fragments of unknown sequence. 
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Oligonucleotide-directed mutagenesis, which replaces a short sequence with a 
synthetically mutagenized oligonucleotide may also be employed to generate evolved 
polynucleotides having improved expression. 

Alternatively, nucleic acid or DNA shuffling, which uses a method of in vitro or in vivo, 
5 generally homologous, recombination of pools of nucleic acid fragments or polynucleotides, can 

be employed to generate polynucleotide molecules having variant sequences of the invention. 

Parallel PCR is another method that can be used to evolve polynucleotides for improved 
expression, function or properties in conventional expression systems, which uses a large number 
of different PCR reactions that occur in parallel in the same vessel, such that the product of one 
10 reaction primes the product of another reaction. Sequences can be randomly mutagenized at 

various levels by random fragmentation and reassembly of the fragments by mutual priming. 
Site-specific mutations can be introduced into long sequences by random fragmentation of the 
template followed by reassembly of the fragments in the presence of mutagenic oligonucleotides. 
A particularly useful application of parallel PCR, which can be used in the invention, is 
15 called sexual PCR. In sexual PCR, also known as DNA shuffling, parallel PCR is used to 

perform in vitro recombination on a pool of DNA sequences. Sexual PCR can also be used to 
construct libraries of chimaeras of genes from different species. 

The polynucleotide sequences for use in the invention can also be altered by chemical 
mutagenesis. Chemical mutagens include, for example, sodium bisulfite, nitrous acid, 
20 hydroxylamine, hydrazine or formic acid. Other agents which are analogues of nucleotide 

precursors include nitrosoguanidine, 5-bromouracil, 2-aminopurine, or acridine. Generally, these 
agents are added to the PCR reaction in place of the nucleotide precursor thereby mutating the 
sequence. Intercalating agents such as proflavine, acriflavine, quinacrine and the like can also 
be used. Random mutagenesis of the polynucleotide sequence can also be achieved by irradiation 
25 with X-rays or ultraviolet light, or by subjecting the polynucleotide to propagation in a host 

(such as E. coli) that is deficient in thenormal DNA damage repair function. Generally, plasmid 
DNA or DNA fragments so mutagenized are introduced into E. coli and propagated as a pool 
or library of mutant plasmids. 
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Alternatively a mixed population of specific nucleic acids may be found in nature in that 
they may consist of different alleles of the same gene or the same gene from different related 
species {i.e., cognate genes). Alternatively, they may be related DNA sequences found within 
one species, for example, the oxidase class of genes. Once the mixed population of the specific 

5 nucleic acid sequences is generated, the polynucleotides can be used directly or inserted into an 

appropriate cloning vector, using techniques well-known in the art. 

Once the evolved polynucleotide molecules are generated they can be cloned into a 
suitable vector selected by the skilled artisan according to methods well known in the art. If a 
mixed population of the specific nucleic acid sequence is cloned into a vector it can be clonally 

1 0 amplified by inserting each vector into a host cell and allowing the host cell to amplify the vector. 

The mixed population may be tested to identify the desired recombinant nucleic acid fragment. 
The method of selection will depend on the DNA fragment desired. For example, in this 
invention a DNA fragment which encodes for a protein with improved properties can be 
determined by tests for functional activity and/or stability of the protein. Such tests are well 

15 known in the art. 

Using the methods of directed evolution, the invention provides a novel means for 
producing functional, and soluble proteins with improved activity toward one or more substrates. 
The mutants can be expressed in conventional or facile expression systems such as E. coli. 
Conventional tests can be used to determine whether a protein of interest produced from an 

20 expression system has improved expression, folding and/or functional properties. For example, 

to determine whether a polynucleotide subjected to directed evolution and expressed in a foreign 
host cell produces a protein with improved activity, one skilled in the art can perform 
experiments designed to test the functional activity of the protein. Briefly, the evolved protein 
can be rapidly screened, and is readily isolated and purified from the expression system or media 

25 if secreted. It can then be subjected to assays designed to test functional activity of the particular 

protein in native form. Such experiments for various proteins are well known in the art, and are 
discussed in the Examples below. 

In one embodiment, the invention contemplates the use polynucleotides encoding for 
variants of oxidase enzymes. The invention employs directed evolution to generate novel 
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oxidase enzymes, such as GAO, which are expressed in host cells {e.g. E. coli) used in an 
expression system, and which exhibit increased functional activity and increased thermostability. 

The invention can also be applied to select or optimize an expression system, including 
selection of host cells, promoters, and signal sequences. Expression conditions can also be 
optimized according to the invention. 

Directed Evolution of Galactose Oxidase 

Galactose oxidase (EC 1.1.3.9) is an alcohol oxidase enzyme. It oxidizes the hydroxyl 
group of the sixth carbon of D-galactose. It also oxidizes many other kinds of sugars and 
alcohols (77, 108, 114, 115, 118-120). Although many fungi produce galactose oxidase, no 
bacterium has been reported to produce the enzyme (109). There are many reports about 
galactose oxidase from Fusarium ssp. NRRL2903, which is identical to Dactylium dendroides 
ATCC46032 (76-78, 84-86, 88, 95, 99, 108, 1 10-128). PIG. 1 The native enzyme is an extra- 
cellular monomer enzyme and has molecular weight as 67,000. It has one copper (II) ion 
associated with it active site and related to its oxidation properties. FIG. 2. Structure and 
amino acid residues related to catalysis have been characterized and reported (76, 78, 84-86, 88, 
111-113, 116-119). 

Galactose oxidase is currently used mainly for assays of D-galactose and D- 
galactosamine. The enzyme oxidizes the hydroxyl group in the substrate to an aldehyde, which 
is reactive. Therefore, the enzyme is implicated for use in production of non-natural sugars and 
derivatives of sugars (1 18, 1 19, 95, 99, 128). Hyper-production of galactose oxidase would be 
useful for a wide variety of applications. The gene of the galactose oxidase has been cloned 
(1 10) and expressed in Escherichia coli (127). This recombinant galactose oxidase was 
produced as a fused protein with the N-terminal sequence of LacZ. However, the yield of the 
galactose oxidase by this recombinant E. coli was not satisfactory. 

According to the invention, galactose oxidase enzyme (GAO) has been produced in high 
activity and with improved properties by recombinant techniques in E. coli. 
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The following Examples are understood to be exemplary only, and do not limit the scope 
of the invention or the appended claims. A person of ordinary skill in the art will appreciate that 
the invention can be practiced in many forms according to the claims and disclosures here. 

5 EXAMPLE 1 

Activity Assays for Galactose Oxidase Expressed in R coli 

This Example describes assays used for evaluating galactose oxidase activity. Galactose 
oxidase generates equimolar amounts of hydrogen peroxide by oxidation of a substrate. 
10 Colorimetric detection of hydrogen peroxide was therefore used to assay galactose oxidase 

activity, employing the following reaction scheme: 

R-CH 2 OH + 0 2 

15 

This system can be used to assay for oxidation of various substrates, with a very high 
20 sensitivity. In the reaction scheme above, an alcohol group of a substrate R is oxidized to 

produce an aldehyde and hydrogen peroxide (H 2 0 2 ) is released. For example, D-galactose is 
converted to D-galactohexodialdose plus H 2 0 2 . The chromogen, in the presence of hydrogen 
peroxide and peroxidase enzyme, e.g. horseradish peroxidase (HRP), produces a detectable color 
change, indicating that the reaction catalyzed by GAO has occurred. 

25 

A. Test Tube Assay 

The activity of galactose oxidase produced in E. coli was investigated using fungal 
galactose oxidase (Sigma, partially purified) as a standard. For detecting hydrogen peroxide with 
peroxidase (Sigma, type I from horseradish), a chromogen was selected for the GAO assays (85). 
30 /. Materials 



GAO peroxidase 

> R-CHO + H 2 0 2 > H 2 0 

chromogen 

I 

color change 
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Cells. E. coli DH5aMCR (Life Technologies) was used for gene manipulation. E. coli 
BL21(DE3) (Novagen) was used as a host strain for expression of galactose oxidase gene. E. 
coli KY-14478 (SN0029, lacking catalase, Kyowa Hakko Kogyo, Co. Ltd.) was also used for 
manipulation and expression of genes (157). Competent cells for electroporation were prepared 
5 (147). 

Cultivation Media. Luria-Bertani LB medium (10 g/1 bacto tryptone, 5 g/1 bacto yeast 
extract, 10 g/1 NaCl, pH 7.5) was used mainly for cultivation of E. coli (19). LB plates 
contained 15 g/1 agar in LB medium. Ampicillin (100 mg/1) was added to the medium when 
required. 

1 o Buffers. Solutions of sodium phosphate, potassium phosphate and Tris-HCl at various 

pHs were tested as buffer solution for the assay. 

Chromogens. Many aromatic compounds can be used as a chromogen for the assay. 
Four chromogens showed particularly strong color formation; green, orange, red and red, 
respectively: (a) 2,2'-azinobis(3-ethylbenzothiazoline-6-sulfonic acid) (ABTS) (85); (b) o- 
15 anisidine; (c) o-dianisidine (127, 123, 121, 122) and (d) o-tolidine (1 14, 119). Their peaks of 

absorbance were 410 nm, 490 nm, 460 ran and 420 ran. 
2. Methods 

Cultivation. Three steps of cultivation were performed for production of galactose 
oxidase. Recombinant E. coli strains were cultivated on LB plate containing ampicillin at 30 °C 

20 for 18 hours. The cells were inoculated to LB containing ampicillin. After cultivation at 30 °C 

for 12 hours, the culture was transfered to a new test tube containing 3 ml LB supplemented with 
ampicillin. The inoculation rate was 0.5 % of medium. Isopropyl beta-D-thiogalactopyranoside 
(IPTG) (1 mM) was added for induction after cultivation at 30 °C for 7 hours. Cultivation was 
continued at 30°C for 6 hours. 

25 Permeabilization. Permeable cells were prepared by freezing (-20°C) - thawing (4°C) 

and treatment with 0.5 mg/1 lysozyme (Sigma, from chicken egg white) for 30 minutes at 37°C. 
This pre-treatment for permeablization was used for assay in evaluation of recombinant galactose 
oxidase. (Example 3). 

Activity assay. The extract was assayed for galactose oxidase activity. Copper (II) 
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sulfate solution (0.4 mM) was added to the cell-free extract. The cell-free extract was diluted 
in the buffer solution. Peroxidase (Sigma, type I from horseradish) (10 units/ml) and azinobis(3- 
ethylbenzothiazoline-6-sulfonic acid) (ABTS) (2 g/1) were added to the reaction solution. The 
reaction solution was pre-incubated at 37 °C for 5 minutes. Substrate was added to the solution 
5 to be 100 mM. The increase of absorbance (410 nm or 405 nm) was measured at 37 °C for 1 

minute. Fungal galactose oxidase (Sigma, partially purified) was used as standard for estimation 
of the activity. 

5, Results 

From these experiments, ABTS was selected as a preferred chromogen for these types 
10 of assays, since ABTS formed its color most strongly and sensitively. Moreover, the highest 

assay sensitivity and lowest background was achieved when using a 100 mM sodium phosphate 
buffer solution (pH 7.0) for the assay. 

Minimum detectable activity of galactose oxidase for this assay system was 0.05 units/ml. 
Galactose oxidase activity between 0.1 and 1 units/ml was measured quantitatively by 
1 5 photometer at 4 1 0 nm or 405 nm. 

Catalase produced by E, coli degrades hydrogen peroxide and may influence the assay. 
In practice, catalase was not observed to pose a problem, because the activity of the galactose 
oxidase was greatly higher than that of catalase. 

Provided below are additional galactose oxidase screening techniques and/or activity 
20 assays, having the following advantages: high specificity for galactose oxidase, high sensitivity, 

good reproducibility, quantitative measurements, simplicity, flexibility for many substrates, and 
low cost. One screening system utilizes microplates and the other utilizes membranes. Both 
systems applies horseradish peroxidase (type I, Sigma) together with a chromogen (ABTS). 



25 B, Microplate Screening Method 

The following micro-plate assay has a high sensitivity. Moreover, the enzyme activity 
can be determined quantitatively. To increase throughput, the method can be automated, for 
example robotically. This method is particularly suitable as a second screen, after active clones 
are identified by a more rapid first screen, such as a membrane screen. In experiments using 
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these procedures, the active cultures on the microplate had galactose oxidase activity as indicated 
by strong green color formation, where each positive well on the microplate was visible as a dark 
circle. GAO activity was screened in 96-well plates. 

Briefly, single colonies were picked from LB-Ampicillin (LB-Ap) agar plates into deep- 

5 well plates and grown in LB-Ap. The master plates were duplicated into new deep-well plates 

containing LB-Ap-1 mM IPTG. Following cultivation at 30°C, CuS0 4 was added and the cells 
were lysed with lysozyme and SDS. Cell extracts were reacted with galactose and allyl alcohol 
using the GAO-HRP coupled assay described above. 
L Methods for Approach A 

1 o Single colonies were picked fromLuria-Bertani/ 100 /zg/ml ampicillin (LB-Ap) agar plates 

into deep-well polypropylene plates (well depth: 2.4 cm; volume: 1 ml; from Beckton Dickinson 
Labware) and cells were grown for 10 h at 30 °C and 270 rpm in 200 [A LB-Ap. The master 
plates were duplicated by transferring a 10 /A aliquot to a new deep- well plate containing 300 
/A LB-Ap and 1 mM isopropyl-beta-D-thiogalactopyranoside (IPTG) and grown for 12 h at 30 

15 0 C and 250 rpm. The cultures were then centrifuged for 1 0 min at 5000 rpm and the cell pellet 

was resuspended in 300 {A 100 mM sodium phosphate (NaPi) buffer, pH 7.0 containing 0.4 mM 
CuS0 4 . Following addition of 0.5 mg/ml lysozyme (35 min at 37 °C) and 2.5% (w/v) SDS 
(overnight at 4 ° C), the GAO activity was assayed using the GAO-horseradish peroxidase (HRP) 
coupled assay (85). Aliquots of the cell extracts were reacted with galactose (50 mM for 

20 generation Al or 25 mM for generations A2 and A3) and allyl alcohol (0.5 M for all generations) 

at pH 7.0. The initial rate of H 2 0 2 formation was followed by monitoring the HRP-catalyzed 
oxidation of 2,2'-azino-bis(3-ethylbenzthiazoline-6-sulfonic acid) (ABTS) at 405 nm. To assay 
thermostability, the plates were heated at a given temperature for 10 min, cooled down on ice 
for 10 min, and allowed to reach room temperature for ca. 5 min before the activity toward 

25 galactose was measured. The thermostability index was determined from the ratio of the residual 

GAO activity to the initial activity. Mutants identified as thermostable were then grown in test 
tubes (3 ml cultures) and the residual activity after heating at various temperatures was measured 
at room temperature. 

2. Methods for Approach B 

3 o Single colonies were picked from LB-Ap agar plates into deep-well polypropylene plates 

(well depth: 4.4 cm; volume: 2.2 ml; from Qiagen) and cells were grown for 8 h at 30 °C and 
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270 rpm in 500 jA LB-Ap. The master plates were duplicated by transferring a 10 [A aliquot to 
a new deep-well plate containing 500 /A LB- Ap- 1 mM IPTG and grown overnight at 3 0 ° C and 
270 rpm. An aliquot of the culture was transferred to a microtiter plate. Following addition of 
0.5 mg/ml (30 min at 37 °C) and 0.4% (w/v) SDS - 0.4 mM CuS0 4 in 100 mMNaPi buffer, pH 
7.0 (4 h at 4 0 C), the GAO activity was assayed using the GAO-HRP coupled assay as described 
above. The galactose concentration used was 25 mM (generations Bl and B2) or 10 mM 
(generations B3 and B4). 

C. Membrane Screening Method 

Although the micro-plate screening system is highly sensitivity and quantitative, it is 
desirable to provide a method that contemporaneously assay many more, e.g. thousands more 
clones in a sensitive, accurate, practical and efficient manner. Methods for detection of galactose 
oxidase activities directly from colonies on agar-plate were examined, but were found to exhibit 
relatively low sensitivity, low reproducibility, and very slow color formation. Hence, to evaluate 
very large number of mutants, methods for detection of their activities directly from colonies on 
agar-plate or from colonies transferred onto a membrane were examined. These methods were 
based on colorimetric detection using chromogen and peroxidase, as in the micro-plate screening 
system. 

A suitable screening method using membranes was developed, as is shown here in one 
optimized form. After transformants formed colonies on an LB-Ap plate (1 00 mg/1 at 30 0 C for 
18-24 hours), these colonies were transferred to a membrane, le, they were adsorbed onto the 
membrane and lifted, for cultivation, the membrane was placed on a new LB-Ap plate (100 
mg/1) and was incubated at 30 °C till new colonies were formed on the membrane (6-12 hours). 
The membrane then was transferred to a new LB-Ap (100 mg/1) plate with 1 mM IPTG, at 30 
°C for 6 hours, for induction. Then, the membrane was put on a filter paper at room 
temperature, containing lysozyme (0.5 mg/ml), D-galactose (100 mM), ABTS (2 mg/ml), 
peroxidase (10 units/ml) and CuS0 4 (0.4 mM). In experiments using these procedures, colonies 
which had galactose oxidase activities showed as deep purple on the filter paper. This simple 
method has suitable sensitivity and can be used to evaluate several thousands colonies on one 
membrane at once. 

Several thousands colonies can be evaluate by the screening method with one membrane. 
This method can be used with an image analyzer, for quantitative determination of activity of 
each colony. Although the sensitivity of this method is not as high as others, the method is fast 
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and is suitable for a first or initial screening, because many thousands or even millions of colonies 
can be contemporaneously or rapidly evaluated. 

In a preferred embodiment, galactose oxidase activities of colonies which were 
transferred on a membrane were estimated directly. Colonies, which were formed on LB- 
Apicillin plate at 30 °C for 24 hours, were transferred onto a membrane (Immobilon NC 
(HATF), surfactant-free, 45 mm, 82 mm, Millipore). The membrane was put on a new LB- 
Apicillin plate and was kept at 30 °C for 6-12 hours till colonies were re-formed. Then the 
membrane was transferred onto an LB-Apicillin plate containing 1 mM EPTG and was incubated 
for 6 hours at 30 °C. After the membrane was put on filter paper containing 0.5 mg/1 lysozyme, 
100 mM substrate, 2 mg/ml ABTS, 10 units/ml peroxidase and 0.4 mM CuS0 4 in 100 mM 
sodium phosphate buffer solution (pH 7.0), the membrane was kept at room temperature for one 
day, covered with a shield (ABTS is light sensitive). Active colonies showed deep purple color 
formations. 

D. Assay Reagents and Conditions 

Some of the assays herein use CuS0 4 , and/or SDS. 

Copper sulfate is used to provide copper (II) ion to activate the recombinant (mutant or 
variant) enzyme. The activity of partially purified galactose oxidase fromZ). dendroides (Sigma) 
was detected well by using peroxidase and ABTS as described; the addition of copper (II) ion 
and other cofactors was not needed. (The Sigma enzyme already includes copper ions.) 
However, experiments with cell-free extracts of recombinant GAO enzymes of the invention 
showed that almost no activity was detected in the absence of copper (II) ions. Thus, the 
presence of copper (II) ion is preferred, and without being bound by any theory, is believed to 
be essential, to activate recombinant GAO enzymes produced by E. coli as described herein. 
Treatment with copper ions at 4 °Cis preferred. Copper ion can be provided as copper sulfate 
(CuS0 4 ). Experiments showed that 0. 1 mM CuS0 4 is sufficient, whereas 10 mM CuS0 4 slightly 
inhibited GAO activity. Experiments under assay conditions showed that the preferred 
concentration of CuS0 4 for activating crude enzyme solution is 0.4 mM. The metal (II) ions of 
iron, cobalt, nickel, and manganese, and the metal chelator EDTA, did not affect activation of 
the recombinant GAO in experiments under assay conditions. Experimental results are shown 
in FIG. 3. under assay conditions, with and without various metal (II) ions or EDTA. 
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Detection enhancers. In certain assay embodiments, sodium azide or sodium sulfide may 
be added, for example in an amount of from about 0.01 mM to less than 1 mM. These reagents 
may enhance detection of GAO activity in some circumstances. 

Detergents. Addition of detergents to the assay solution also increased the observed 

5 activity. Pretreatment with SDS was most effective for increasing the galactose oxidase activity. 

Treatment with SDS for longer than 12 hours at 4 °C after treatment with lysozyme was suitable 
for the assay. The galactose oxidase activity did not change within the treatment for 12 to 24 
hours at 4 °C. Cultivation, pre-treatment and assay were done as described above. 

Other detergents may also be used, as shown in TABLE 1. In these experiments, 

10 approximately 0.1 units/ml culture of£. coti BL21(DE3)/pGAO-010 and 0.25 units of partially 

purified galactose oxidase (Sigma) were used. Cells were treated with 0.5 mg/ml lysozyme at 
37 °C for 30 minutes. Enzyme and cells were treated with detergents at 4 °C for 1-12 hours. 
Galactose oxidase activities were assayed using the microplate method described above. 

Cultivation. Activation on LB-Ap (100 mg/1) plate for 12-24 hours at 30 °C and seed- 

1 5 cultivation in LB-Ap (100 mg/1) 200-500 //1/well for 8-10 hours at 30 °C provided uniform 

growth for cultivation. These conditions are suitable if not necessary for the assay, using the 
cells, reactants and reagents in these experiments. 

The addition of IPTG as an inducer was observed to be necessary for the expression of 
galactose oxidase on microplate cultivation in these experiments. Initial addition of IPTG to the 

20 medium was preferred to the addition of IPTG during cultivation. A cultivation time of 1 2- 1 6 

hours was preferred, and provided superior results (overall higher activities) for almost all 
recombinant E. coli which had a plasmid for expression of galactose oxidase in these 
experiments. The growth of cells was stopped before 16 hours and the cell extracts had almost 
no activity at 37 °C. Cultivation at about 30 °C was the optimal temperature in these 

25 experiments. 
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TABLE 1 
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20 

EXAMPLE 2 
Construction of Galactose Oxidase Plasmids 

Plasmids were constructed to express galactose oxidase gene (gao) from Fusarium ssp. 
25 as described below. Several vectors were examined for high expression. Plasmids with different 

promoters and different sequences between the GAO gene and the ribosime binding site were 
constructed, as described. Escherichia coli strain BL21(DE3) and KY-14478 were transformed 
with these plasmids. Permable cells from test tube cultures were used for the assay. 



CIT-3183 



-35- 



ATTORNEY DOCKET 1G811-US1 



A. Construction of Plasmids 

1. Modified pUCl 8 Vector Plasmids 

Modified pUC18 plasmids were made to be used for constructing galactose oxidase 
expression plasmids. As shown in FIG. 7, vector pUC18 was digested with the restriction 

5 enzyme Hindm, blunted with T4 DNA polymerase and ligated with T4 DNA ligase to create 

vector pUCl 8-HL lacking the Hindm site. pUCl 8-HL was digested with £coRL blunted with 
T4 DNA polymerase and ligated with T4 DNA ligase to create vector pUC18-EHL lacking the 
EcoSl and Hindm sites. Similarly, pUCl 8-EHL was digested with Pstl, blunted with T4 DNA 
polymerase and ligated with T4 DNA ligase to create vector pUC 1 8-EHPL, lacking the EcoRl, 

10 Hindm, and Pstl sites. 

2. GAO Vector Plasmids 

As shown in FIG. 8, plasmid pGAO-0 1 0 expressing GAO was made using plasmid pR3 . 
Plasmid pR3 contains the gene for mature galactose oxidase (GAO) fused to the 5' end of the 
lacZ fragment, and was obtained from Dr. Howard K. Kuramitsu (Dept. of Oral Biology, State 

1 5 University of New York, Buffalo, NY). The GAO gene was amplified from pR3 by PCR using 

primers P-MY001 and P-MY002 in order to introduce a Hindm restriction site followed by an 
ATG initiation codon immediately upstream from the mature GAO sequence, and an^al site 
immediately downstream from the stop codon. (Primer sequences are shown in FIG. 6). The 
PCR product was digested with Hindm and Xbal and ligated into a similarly digested pUCl 8 

20 vector to create pGAO-00 1 . Plasmid pPLA-00 1 is a modified pUC 1 8 vector containing a double 

lac promoter. The lac promoter from pUC18 was amplified using primers P-MY003 and P- 
MY004. The PCR product was digested with EcoRl and Hindlll and ligated into a similarly 
digested pUCl 8 vector. Following digestion of pGAO-001 mthHindm and Xbal, pPLA-001 
with EcoRI and Hindm, and pUC18-HL with EcoRI and Xbal, plasmid pGAO-010 was 

25 generated by ligation with T4-DNA ligase. 

Another plasmid, pGAO-036, was made by amplifying pGAO-010 using primers P- 
MY036 and P-MY002. FIG. 9. The PCR product was digested withKpril mdXbal and ligated 
with a similarly digested pUCl 8-EHL to create plasmid pGAO-027. Plasmid pGAO-027 was 
digested with Kprii and Xbal and ligated with a similarly digested pUCl 8-EHPL to create 
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plasmid pGAO-036. This plasmid contains a unique Pstl site. Plasmid pGAO-036 was used as 
a for directed evolution experiments described herein. 

Another plasmid, pGAO-01 1, was made using similar techniques, as shown in FIG. 10. 

5 B. Plasmids and transformation 

Plasmids for expression of galactose oxidase were constructed as described above. The 
galactose oxidase enzyme was amplified from pR3 (Fusarium ssp.) by PCR. The lac promoter 
of pUC18 and 77 promoter of pET-22b(+) (Novagen) were used for expression. In addition to 
expression as mature sequence of galactose oxidase, expression of the gene as a fused protein 

10 with other peptides was examined. The N terminal sequence of LacZ was selected to express 

the galactose oxidase as a fused protein (127). PelB leader sequence was also used to produce 
galactose oxidase in periplasm. Furthermore, His-tag which is useful for purification of 
recombinant proteins was examined as an additional sequence of the C-terminal of galactose 
oxidase. 77 terminator sequence was used for stabilization of expression. Two different oris 

1 5 were chosen for replication of plasmid. The copy number of plasmid with ori from pUC series 

is higher than the plasmid with ori from pBR series. 

In more detail, plasmids pUC18, pET-22b(+) (Novagen) and derivatives were used as 
vector plasmids. Galactose oxidase gene from Fusarium ssp. was amplified from pR3 according 
to known techniques. (110, 127). Genes were manipulated according to conventional methods 

20 using kits from Qiagen (Valencia, CA). The QIAprep Spin Miniprep Kit, QIAquick Gel 

Extraction Kit and QIAEX II Gel Extraction Kit, were used resepctively for purification of 
plasmids from cells, purification of DNA fragments and extraction of DNA fragments from 
agarose gel. E. coli DHSaMCR was transformed with plasmids by treatment with CaCl 2 (19). 
Electroporation was used for transformation of E. coli BL21(DE3) with plasmids (147, 148). 

25 pUC 1 8 and pET-22b(+) (Navagen) were used as vector plasmids. The gene of galactose 

oxidase from pR3 (127) was used, lac promoter from pUC18, tac promoter from pKK223-3 
( Amercham Pharmacia Biotech) and 77 promoter from pET-22b(+) were selected for expression 
of the gene. The N terminal sequence of LacZ from pUC18, PelB leader, His-tag and 77 
terminator sequences from pET-22b(+) were used for production of galactose oxidase. The 
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gene and parts for expression were prepared by PCR. PCR was done in 100 ml of reaction 
solution containing PCR buffer (10 mM Tris-HCl, pH 8.5, 50 raM KC1, 2.5 mM MgCl2, 0.01 
% gelatin), 1 ng of DNA as template, 50 p mole of each primers, 2.5 units of Taq DNA 
polymerase (Perkin Elmer) and 50 n mole of each dNTPs. DNA fragments were amplified in 30 

5 cycles of 30 seconds at 94 °C, 30 seconds at 50 °C and 60 seconds at 72 °C. PCR products 

were purified by QIAquick PCR Purification Kit (Qiagen). Cutting and ligation of DNA by 
enzymes were according by "molecular cloning" (19). E. coli cells were transformed with 
plasmids by electroporation (Bio-Rad, gene Pulser). QIAprep Spin Miniprep Kit (Qiagen) was 
used for purification of plasmid from E. coli recombinant cells. 

10 Using these strategies, plasmids were designed to produce the galactose oxidase gene. 

The plasmids were transformed to E. coli DH5aMCR, BL21(DE3) and KY-14478. 
Representative plasmids are shown diagrammatically in FIG. 11, according to the general 
scheme shown in FIG. 12. 

Expression of the galactose oxidase gene in all constructed plasmids was controlled by 

15 the lac operator. Therefore, induction by isopropyl b-D-thiogalactopyranoside (IPTG) was 

necessary for production of the enzyme (FIG. 11). The expression of galactose oxidase was 
highest when IPTG (1 mM) was added after cultivation for 7 hours and cells were incubated for 
6 more hours. Cultivation at 30 °C gave greatest activity of galactose oxidase per cultivation. 
Expression of the enzyme was remarkably decreased at 37 °C. Lower temperatures than 27 °C 

20 were not suitable in the experiments because the cells grew very slowly. 

Incubation on LB plate at 30 °C for 18 hours and pre-cultivation in LB at 30°C for 12 
hours stabilized the main cultivation. The optimal culture conditions were selected as shown 
above. 

25 C. Galactose oxidase activity 

Galactose oxidase activities of the recombinant E. coli were measured (FIG. 1 1). Some 
recombinant strains showed much higher activities than the recombinant plasmid pR3. These 
recombinants hold plasmids which were constructed with lac promoter and ori from pUC series. 
Some recombinant E. coli with plasmids, pGAO-018 and pGAO-023, expressing the galactose 
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oxidase gene by 77 promoter did not grow well Their galactose oxidase activities were not 
detected. Although some recombinants holding plasmid with 77 promoter, pGAO-008 and 
pGAO-009, grow normally, they showed low galactose oxidase activity. From these results, lac 
promoter was suitable for expression of galactose oxidase gene. Furthermore, double lac 
5 promoter seemed to be stronger than single lac promoter in some but not all cases. 

For example, plasmid pGAO-025 was designed to have double lac promoter and lacZ- 
gao fused gene (FIG. 13). However, galactose oxidase activity of a recombinant with pGAO- 
025 was almost the same as a recombinant with pGAO-01 1 which had a single lac promoter in 
KY-1447 cells but was more active than pGAO-01 1 in BL21(DE3) cells. Triple lac promoter 
10 was also examined to express the galactose oxidase gene. The effect of triple promoter was 

about the same as double promoter, e.g. in pGAO-028 and pGAO-010 (FIGS. 15 and 17). 

Galactose oxidase which was fused with the N-terminal sequence of LacZ oxPelB leader 
was produced, as well as non-fused proteins. The activity of galactose oxidase fused with PelB 
leader was not detected without a pre-treatment of cells. Detection of activity of the enzyme 
1 5 required same the pre-treatment of recombinant cells as others. In these experiments GAO was 

not secreted in the medium, although a secretion signal sequence was present. 

Plasmids pGAO-003 and pGAO-005 were designed to produce galactose oxidase in 
fused form with His-tag at its C-terminal. No galactose oxidase activity was detected from 
recombinant strains with these plasmids. 
20 Terminator sequence sometimes stabilizes gene expression. In these experiments, 

introduction of 77 terminator sequence apparently did not increase GAO expression. Compare 
pGAO-020 with pGAO-010 or pGAO-022 with pGAO-017. 

E. coli DHSaMCR expressed the galactose oxidase gene with these plasmids. However 
their activities were lower then that of recombinant strains of E. coli BL21(DE3) and E. coli 
25 KY-14478 (data not shown). £. coli BL21(DE3) and E. coli KY-14478 with plasmid pGAO- 

0 1 0 or pGAO-027 successfully expressed galactose oxidase in high activity. These two plasmids 
have the same sequence except for one restriction endonuclease site in the vector sequence. 
Their structure is suitable to express the galactose oxidase in a mature fungal sequence. 
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Consequently, E. coli BL2 1 (DE3) and E. coli KY- 1 4478 harvesting plasmid pGAO-0 1 0, pGAO- 
027 or their derivatives were used for continued experiments. 

D. Codon Alternation 

Codon alternation of the N-terminal sequence of a gene, without changing the peptide 
sequence, may cause higher expression of the gene in some cases. Codons of six N-terminal 
amino acid residues of galactose oxidase were exchanged randomly by PCR with a mixed primer, 
with the following alternations. 

SEP ID NO: 

(M) ASAPIGSA 26 
Wild-type sequence ATG GCC TCA GCA CCT ATC GGA AGC GCC . . . 27 
Random Alternation N — N — N — N —A — N ... 28 

T 
C 

The galactose oxidase gene of pGAO-010 was replaced with PCR products comprising 
the galactose oxidase gene with random codon alternation. The plasmids of this library were 
named pGAO-OlOM. This random codon alternation of the N-terminal sequence did not cause 
higher expression (FIG. 14), and in many cases GAO activity was reduced. No significant 
difference was observed when E. coli KY- 14478 was used as a host strain, compared with E. 
coli BL21(DE3). 

E. Optimization of upper sequence of gao 

The region between the Shine-Dalgarno ("SD") sequence AGGA and the initiation 
codon, ATG, is sensitive for efficient RNA translation and has a significant influence on 
expression of gene. One to three bases were inserted between SD of the lac promoter and the 
ATG of the galactose oxidase gene in pGAO-027 to investigate the impact of altering the 
distance between SD and ATG. A change in the length of the region between SD and ATG 
causes a decrease in galactose oxidase activity when E. coli BL21(DE3) was used as a host 
strain (TABLE 2; SEQ ED NOS: 29-36). The original sequence of pGAO-027 or the one-base 
extended sequence of pGAO-029 were preferred for expression of the gene. When£. coli KY- 
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14478 was used as a host strain, one or two bases extension of the sequence between SD and 
ATG were preferred to express the gene. 

TABLE 2 



10 



15 



jrid&iiiiu. 


Sequence between SD and ATG 


Promoter 


GAO Activity (units/ml) 


BL21(DE3) 


KY-14478 


027 


. . . AGGAAAAGC T TAT G . . . 


P/ac 


19.0 


12.5 


029 


. . .AGGAAAAAGCTTATG. . . 


19.1 


15.7 


030 


. . .AGGAAACAAGCTTATG. . . 


16.3 


15.9 


031 


. . . AGGAACAAAGCTTATG . . . 


14.3 


13.1 


032 


. . .AGGAAAAGCTTATG. . . 


Ptac 


30.6 


52.4 


033 


. . .AGGAAAAAGCTTATG . . . 


25.7 


56.2 


034 


. . . AGGAAACAAGCTTATG . . . 


34.6 


49.8 


035 


. . .AGGAACAAAGCTTATG. . . 


22.1 


38.7 


"Plasmids are designated pGAO-XXX, where XXX is 027 through 035 



The tac promoter often if not usually expresses genes at higher levels than lac promoter. 

tac promoter was prepared from pKK223-3 (Amercham Pharmacia Biotech) by PCR. lac 
20 promoters of plasmids, pGAO-027, pGAO-29, pGAO-030 and pGAO-03 1 were replaced with 

tac promoter. Recombinant strains with plasmids using tac promoter for expression showed 

approximately twice as much activity than the recombinant strains using lac promoter (TABLE 

3). The optimal distance between SD and ATG under the tac promoter was almost the same as 

that under the lac promoter in both E. coli strains. 
25 Recombinant strains E. a>//BL21(DE3)/pGAO-034 and£. coli KY-14478/pGAO-033 

were considered to be good for expression of galactose oxidase. Optimal culture conditions for 

these strains were as described above. 



F. Properties of recombinant galactose oxidase 
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Galactose oxidase from Dactylium dendroides (Fusarium ssp.) and the enzyme from 
recombinant £. coli BL21(DE3)/pGAO-010 differs only in glycosilation; their amino acid 
sequences are identical. 

Substrate specificities of recombinant galactose oxidase from E. coli and the enzyme 
from fungi were compared. Cell-free extract of E. coli BL21(DE3)/pGAO-010 was used as a 
crude recombinant enzyme from E. coli. Partially purified galactose oxidase from Dactylium 
dendroides (Sigma, partially purified) was used as fungal enzyme. Substrate specificities of these 
two enzymes were almost same (FIG. 15). 

EXAMPLE 3 
Optimization of error-prone PCR conditions 

A. General PCR Conditions 

Mutation of the galactose oxidase gene {gad) was induced by error-prone PCR and 
according to known techniques (66, 129-133, 136-139). Wild type gao on pGAO-027 was 
replaced by the PCR products which were mutant galactose oxidase genes. The resultant 
plasmids were named as pGAO-027M. E. coli BL21(DE3) was transformed with these 
plasmids. Almost all transformants carrying error prone PCR products instead of wild type gao 
lost their galactose oxidase activities (FIG. 7). Mutations were induced on the whole galactose 
oxidase gene by error-prone PCR, using conditions "A" of TABLE 3. 228 clones were selected 
randomly from each set of conditions with different manganese concentrations. These clones 
were cultivated and assayed with micro-plates. More than 65 % of transformants lost their 
galactose oxidase activity, even though manganese ions were not added to the PCR solution. 

Various reaction conditions for error-prone PCR were compared, and in particular 
milder conditions were examined for mutation of the galactose oxidase gene. Conditions 11 A" 
and "C" are the previous conditions of error-prone PCR (above) and normal PCR conditions, 
respectively. The use of a buffer solution for error-prone PCR (Buffer EP) increased the error 
rate. Non-uniform composition of dNTPs for error-prone PCR (dNTPs EP) induced mutations 
in a higher rate than uniform composition of dNTPs for normal PCR (dNTPs normal). Tag DNA 
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polymerase from Promega Corporation showed a higher error rate than the enzyme from Perkin 
Elmer. Since the rate of inactivation was 3 1 % at most in condition "C" (FIG. 5), induction of 
mutation was not optimal, and may have been insufficient. In FIG. 5, mutations were induced 
in the whole galactose oxidase gene by error-prone PCR using conditions "C" of TABLE 3. 

5 Activities of 288 clones from each set of conditions with different manganese concentration were 

estimated using micro-plate screening. 

From the alternatives examined in these experiments, Error-prone PCR condition "F" had 
a suitable frequency of error and was selected to induce mutation on the galactose oxidase gene 
in further experiments. The composition of buffer solution, the content of dNTPs and 

1 0 thermophilic DNA polymerase each affected the rate of mutation. For example, the difference 

between the buffer solution for normal PCR and the buffer solution for error-prone PCR was that 
the EP buffer contained gelatin. Since gelatin is not expected to influence the error rate of the 
PCR reaction, the observed rate difference may be due to a small difference in the final pH of 
reaction mixtures with these buffer solutions. More error was induced by non-uniform content 

1 5 of dNTPs for error-prone PCR than uniform content of dNTPs for normal PCR. Selection of 

the thermophilic DNA polymerase can be significant when optimizing an error-prone PCR 
experiment, as the particular polymerase may influence the mutation rate. 

PCR conditions selected for mutation of the whole galactose oxidase gene in these 
experiments was milder than previously disclosed conditions (66, 129-133, 136-139). When the 

20 PCR conditions described previously were used for error-prone PCR of galactose oxidase gene, 

the mutation rate was too high, resulting in too many inactive or low activity clones. This result 
may be related to the fact that the galactose oxidase gene is as much as twice as large as genes 
previously used for error-prone PCR in the literature. Without being bound by any theory, 
deadly mutations may be induced more frequently as the target gene becomes larger. 

25 In TABLE 3, 96 of 288 clones were selected randomly from each library. Their 

galactose oxidase activities were estimated by micro-plate screening method. Rates of clones 
which lost their galactose oxidase activities are show in the table. 

FIG. 4 and FIG. 5 show the effect of varying amounts of MnCl 2 in these experiments. 
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In the mutagenesis methods used herein, the error rate is from 1-6 mutations per 
polynucleotide, preferably 4-6, and most preferably 6. In certain embodiments with more than 
one round of directed evolution, the error rate may be different from one round to another. For 
example, the error rate may be about 1-2 mutations per polynucleotide in one round (e.g. a first 
5 round), and may be about 4-6 mutations per polynucleotide in another round (e.g. a second 

round). 



TABLE 3 

10 



PCR conditions Inactivated clones [%] 







Buffer 




dNTPs 


TaqDNA 
polymersae 


MnCl 2 
OmM 


MnCl 2 
O.lmM 


MnCl 2 
0.15mM 


MnCl 2 
0.2mM 


MnCl 2 
0.4mM 


MnCl 2 
0.5mM 




A 


EP 


7mM 


EP 


Promega 


50 u/ml 


60 

(173/288) 


69 

(199/288) 


77 

(223/288) 


76 

(220/288) 


90 

(258/288) 


94 

(270/288) 




B 


EP 


7mM 


normal 


Promega 


50u/ml 


55 

(53/96) 


61 

(59/96) 




X 15 


C 


normal 


2.5mM 


normal 


Perkin 
Elmer 


25 u/ml 


3 

(3/96) 


10 

(10/96) 












5 

(14/288) 


9 

(27/288) 


10 

(29/288) 


11 

(3 1/288) 


28 

(81/288) 


31 

(90/288) 




D 


EP 


7mM 


EP 


Perkin 
Elmer 


25 u/ml 


45 

(43/96) 


61 

(59/96) 






E 


EP 


7mM 


EP 


Perkin 
Elmer 


50 u/ml 


39 

(37/96) 


52 

(50/96) 






F 


normal 


7mM 


EP 


Perkin 
Elmer 


25 u/ml 


23 

(22/96) 


41 

(39/96) 




20 


G 


normal 


7mM 


EP 


Promega 


50 u/ml 


41 

(39/96) 


52 

(50/96) 






H 


EP 


7mM 


normal 


Promega 


50 u/ml 


51 

(49/96) 


61 

(59/96) 





Buffer EP : (xlO) 500 mM KC1, 100 mM Tris-HCl (pH 8.3), 0. 1% (w/v) gelatin 

Buffer (normal) : (xlO) 500 mM KC1, 100 mM Tris-HCl (pH 8.3) 

dNTPs EP : 0.2mM dGTP, 0.2 mM dATP, 1 mM dCTP, 1 mM dTTP 



25 dNTPs (normal) : 0.5MdGTP, 0.5 mM dATP, 0.5 mMdCTP, 0.5 mM dTTP 
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EXAMPLE 4 
Production of Galactose Oxidase Mutants 

The directed evolution of galactose oxidase (GAO) is described. GAO variants with 
5 increased activity toward allyl alcohol and D-galactose and increased thermostability relative to 

wild-type have been identified. 

A. Construction of GAO Mutant Libraries 

Plasmid pGAO-036, expressing wild-type GAO, was used as the parent for the directed 
1 0 evolution of GAO (FIG. 9). 

Two strategies have been followed for the directed evolution of the enzyme: (A) 
mutagenesis of the whole GAO gene (bases 1-1917) and (B) mutagenesis of part of the GAO 
gene (bases 518-1917). In Approach A, two rounds of error-prone PCR (45) have been 
performed (generations Al and A2), followed by one round of StEP recombination (generation 
15 A3) (139) of four improved variants identified in library A2. In Approach B, four rounds of 

error-prone PCR (45) have been performed (generations Bl through B4). E. coli strain 
BL21(DE3) (Novagen) was used for the expression of GAO. 
L Approach A 

Error-prone PCR was carried out in a 100 /A reaction mixture containing about 0.3 y% 
20 plasmid DNA as template, 30 pmol of each primer, 0.2 mM dGTP, 0.2 mM dATP, 1 mM dCTP, 

1 mM dTTP, 7 mM MgCl 2 , 0. 1 mM MnCl 2 , and 2.5 U Taq polymerase (Perkin Elmer) in 10 mM 
Tris-HCl, 50 mM KC1 buffer, pH 8.5. PCR conditions were as follows: 30 cycles of 94 °C for 
30 seconds, 50 °C for 30 seconds and 72 °C for 60 seconds. The percentage of inactive clones 
was between 30 and 50%. 

25 StEP recombination of the four improved variants identified in generation A2 was 

performed in a 1 00 (A reaction mixture containing about 0.3 mg (total) plasmid DNA as template 
(prepared by mixing equal amounts of all four plasmids), 10 pmol of each primer, 0.5 mM of 
each dNTP, 2.5 mM MgCl 2 , and 5 U Taq polymerase (Perkin Elmer) in 10 mM Tris-HCl, 50 
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mM KC1 buffer, pH 8.5. PCR conditions were: 95 °C for 3 minuntes and 100 cycles of 94 °C 
for 3 0 seconds and 5 8 ° C for 1 0 seconds. The primers used for error-prone PCR and StEP were: 
5*-AATTCGAAGCTTATGGCCTCAGCACCTATCGGAAGC-3' (forward) [SEQ. ID. NO. 1] 
and 5-CTTCCTTCTAGATTACTGAGTAACGCGAATCGT-3' (reverse) [SEQ. ID. NO. 2]. 
2. Approach B 

Error-prone PCR was carried out in a 100 (A reaction mixture containing 10 ng plasmid 
DNA as template, 50 pmol of each primer, 02 mM of each dNTP, 7 mM (generations Bl and 
B2) or 4 mM MgCl 2 (generations B3 and B4), and 5 U Taq polymerase (Boehringer Mannheim) 
in 10 mM Tris-HCl, 50 mM KC1 buffer, pH 8.3. PCR conditions were as follows: 94 °C for 2 
minutes and 25 cycles of 94 °C for 30 seconds, 58 °C for 30 seconds and 72 °C for 60 seconds. 
The primers used were: 

5 -TTGTTCCTGCGGCTGCAGCAATTGAACCG-3' (forward) [SEQ. ID. NO. 8] and 
5'-TGCCGGTCGACTCTAGATTACTGAGTAACG-3* (reverse) [SEQ. ID. NO. 9]. 
The percentage of inactive clones was between 30 and 40%. 

B. Screening of GAP Libraries 

GAO activity was screened in 96-well plates, using the methods of Approaches A and 
B, respectively, as described in Example 1(D). 

C. Laboratory Evolution of GAO 

The thermal stability curves of selected GAO variants are shown in FIG. 16. Variants 
were grown in test tubes (3 ml cultures). Following centrifugation and resuspension of the cell 
pellets in NaPi buffer, pH 7.0 containing CuS0 4 , the cells were lysed. Aliquots of the cell 
extracts were heated at each temperature for 10 min and then cooled down on ice for 10 min 
before the residual activity toward D-galactose was determined at room temperature. 

Results of the laboratory evolution of GAO to increase activity and thermostability are 
listed in TABLE 4. T 50 is an operational measure of stability and is defined as the temperature 
at which the enzyme loses 50% of its activity following incubation for a set time. 
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Wild type GAO (pGAO-036) was used as the parent for generation Al of GAO variants. After 
screening about 1500 clones, three mutants, 9.16.8D2, 9.16.6C11 and 9.16.16D12, were 
identified as more active toward allyl alcohol and/or galactose. Clone 9. 16. 16D12, which was 
also more thermostable than wild-type GAO, was used to parent generation A2 of GAO variants. 

5 Four improved mutants were identified in this library following screening of about 1 500 clones: 

11.03.6D3, 11.03. 10C3, 11.03. 10D6and 11.03. 13E12. These clones were more active than the 
parent toward allyl alcohol and galactose. Clone 11.03.10C3 was substantially more 
thermostable than the parent, as well. These four improved variants were recombined by StEP 
in generation A3 . Screening of about 2000 clones led to the identification of variant 1 .06.20E7 

10 which shows about a 200-fold increased activity toward allyl alcohol and D-galactose and 

exhibits about a 12 °C higher T 50 with respect to wild-type GAO. 

Wild-type GAO (pGAO-03 6) was used as the parent for generation B 1 of GAO variants. 
After screening about 900 clones, variant 1.D4 was identified as more active toward galactose 
and used to parent generation B2. Mutant 2.G4 was identified as more active toward galactose 

15 in this library following screening of about 1500 clones. Library B3 of GAO variants was 

generated using 2.G4 as the parent, and clone 3.H7 was identified as an improved variant after 
screening about 1 500 clones. Finally, library 4B was created using 3 .H7 as the parent and about 
1500 clones were screened. Variant 4.F12 was identified as about 15-fold more active toward 
galactose relative to wild-type GAO. 

20 

D. Active and Thermostable Mutations 

Most beneficial mutations occur in domains II and III of the GAO gene (residues 156- 
532 and 533-639, respectively) (87). Mutation V494A, which was identified several times in the 
screen, is located at the bottom of the active site adjacent to the copper ligand Y495. Its 
25 presence increases the binding affinity for galactose approximately 3-fold. N535D is found in 

a solvent-exposed loop in domain III. The amino acid substitution Gl 95E is largely responsible 
for the observed increase in thermostability of variant 1 .06.20E7 relative to wild-type. See FIG. 
16 and TABLE 4. 



CIT-3183 



-47- 



ATTORNEY DOCKET 1G811-US1 



It should also be noted that a large number of mutations (five in these experiments) 
resulted from the substitution of a neutral residue by a negatively charged residue. This tends 
to decrease the isoelectric point of GAO in the mutants (the pi of wild type GAO is 12). A 
decrease in pi is advantageous, in that it may lead to fewer interactions between the mutant GAO 
and other macromolecutes, and lower adhesion to glass. It may also permit increased use of 
crude galactose oxidase preparations in organic synthesis (107). 



TABLE 4 

Mutations identified in GAO variants and their effects on GAO properties. 



(jrliJN 


GAO name 


nucieouue uabt? 
substitution 


dilliXiVJ avlU 

substitution 


rplfitivft 

1 VldH V V" 

activity for 
allyl alcohol* 


relative 
activity for 
D-galactose 


T 

50 

CO 


0 


pLrAU-loo 


IN/ A I ) 




1 0 


1.0 


42 


Al 


9.16.8D2 


A1609G 


N537D 


2.6 


4.6 




Al 


9.16.6C11 


T1481C 
T1543A 


V494A 
C515S 


2.8 


1.3 




Al 


9.16.16D12 


T1481C 
T408C 


V494A 
P136 


3.0 


4.9 


44 


A2 


11.03.6D3 


T1481C 

T408C 

T28C 


V494A 

P136 

SlOP 


6.4 


11 




A2 


11.03.10C3 


T1481C 
T408C 
G584A 
A9C 


V494A 
P136 
G195E 
A3 


3.8 


9.6 


54 


A2 


11.03.10D6 


T1481C 

T408C 

A936G 

A1603G 

T654C 


V494A 

P136 

L312 

N535D 

T218 


5.4 


11 




A2 


11.03.13E12 


T1481C 

T408C 

A208G 


V494A 

P136 

M70V 


5.1 


9.1 




A3 


1.06.20E7 


T1481C 

T28C 

T408C 

A208G 

G584A 

A1603G 


V494A 

SlOP 

P136 

M70V 

G195E 

N535D 


20 


55 


54 


Bl 


1.D4 


A1237G 


N413D 




2.4 
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B2 


2.G4 


A1237G 
T1650A 


N413D 
S550 




4.0 




B3 


3.H7 


A1237G 
T1650A 
T1481C 


N413D 

S550 

V494A 




8.6 




B4 


4.F12 


A1237G 
T1650A 
T1481C 
T1830A 


N413D 
S550 
V494A 
S610 




15.2 





*Allyl alcohol is oxidized by wild-type GAO at ca. 3% the rate of galactose oxidation. 



5 

Mutations identified at residues A3, L3 12, T218, P136, S550 and S610 are synonymous 
and, without being bound by theory, the observed increase in activity is probably due to higher 
expression of GAO in E. coli. Given the low expression level of recombinant wild-type GAO 
(less than 3% of total intracellular protein as determined by SDS-PAGE), this is a much needed 
10 improvement. 

The variants identified also exhibit increased activity toward a variety of GAO substrates. 
Mutant 1.06.20E7 is about 200-fold more active toward 3-pyridylcarbinol and mutant 4.F12 is 
about 15-fold more active toward glycerol, xylitol, beta-D-lactose, and IPTG. 

The sequences of representative mutants of the invention identified in TABLE 4 are 
15 shown in FIGS. 17-28. 

As shown in the above Examples, the galactose oxidase gene can be expressed in £. coli 
in relatively high yield, with an increased activity toward at least one substrate. In certain 
embodiments the activity is greatly increased toward several substrates. In certain embodiments 

20 the mutants exhibit thermostability. 

The inducible promoters Viae or Ytac were effective for expression of the galactose 
oxidase gene and are preferred. Much higher expression may be possible when other strong 
promoters are used. However, some strong promoters maybe counterproductive. For example, 
E. coli did not grow well when T7 promoter, which is stronger than lac promoter, was used for 

25 expression of the galactose oxidase gene. Double promoters of two Vlac-Plac or Ylac-Ptac were 

selected to express the galactose oxidase gene. Double promoters express the gene stronger 
than single promoter as compared pGAO-025 and pGAO-01 1 . Triple promoters expressed the 
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gene as well as double promoters. Upper promoter of double promoters seemed to be less 
effective than lower promoter in the Examples. Therefore, double promoters of Ylac-Ylac or 
Plac-Ptac are preferred. Induction of gene by IPTG was necessary when lac promoter or tac 
promoter was used. Timing of induction and incubation time after that were optimized. 

5 In these experiments the fused form of GAO (i.e. as a fusion protein with lacZ) was not 

found to provide advantages, and was not necessary to express the fungal gene. 

Galactose oxidase generally had reduced activity or lost its activity when codons were 
alternated or when it was produced as fused enzyme with His-tag. Culture condition was also 
important for production of the enzyme. 

1 o Galactose oxidase was engineered by directed evolution to produce more active variants 

toward natural and additional substrates. Activity of the present mutants was as high as about 
65 times that of wild-type GAO. Mutants of the invention also are more stable than wild-type, 
and in particular exhibit improved thermal stability. 
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CLAIMS 

What is claimed is : 

1. A method of obtaining and improving the production of a functional galactose oxidase 
polypeptide by a host cell comprising the steps of: 

(a) providing at least one parent galactose oxidase polynucleotide encoding a parent 
galactose oxidase polypeptide, 

(b) altering the nucleotide sequence of the parent polynucleotide by random 
mutagenesis to produce a population of mutant polypeptides; 

(c) transforming host cells to express the mutant polypeptides; 

(d) screening for first-generation functional mutants produced by the host cells and 
having at least one modified property; 

(e) selecting at least one polynucleotide encoding a first-generation mutant as a 
parent polynucleotide; and 

(f) repeating a round of altering, transforming and screening steps at least once to 
obtain at least one other generation of one or more mutants. 

2. The method of claim 1 wherein the method of random mutagenesis comprises an error- 
prone polymerase chain reaction. 

3. The method of claim 2, wherein the error-prone polymerase chain reaction employs 
unbalanced nucleotide concentrations. 

4. The method of claim 2, wherein the error-prone polymerase chain reaction employs 
manganese ions in a concentration of about 0 to about 500 yM. 

5. The method of claim 2, wherein the error-prone polymerase chain reaction employs 
manganese ions in a concentration of about 100 //M. 
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6. The method of claim 2, wherein the polymerase chain reaction generates an error rate of 
about 1-2 mutations per polynucleotide. 



7. The method of claim 2, wherein the polymerase chain reaction generates an error rate of 
up to about six mutations per polynucleotide. 

8 . The method of claim 1 , wherein at least one of the altering, transforming and screening 
steps are changed in at least one repeated round. 

9. The method of claim 1 , wherein the conditions for random mutagenesis in at least one 
repeated round of altering, transforming and screening are different from the conditions 
in any other round of altering, transforming and screening. 

10. The method of claim 1, wherein the host cells in at least one repeated round of altering, 
transforming and screening are different from the host cells in any other round of 
altering, transforming and screening. 

11. The method of claim 1 0, wherein the host cells in at least one round are bacterial cells. 

12. The method of claim 1 1, wherein the bacterial cells are E. coli cells. 

13. The method of claim 1, wherein at least one round of altering, transforming and 
screening comprises screening for a property of the polypeptide that was not screened 
for in another round of altering, transforming and screening. 



14. The method of claim 13, wherein at least one property is selected from the group 
consisting of enzyme activity, enzyme selectivity, enzyme stability, and enzyme yield. 
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15. The method of claim 1 , wherein each screening step comprises screening for one or more 
of the biological activity of the polypeptide, the selectively of the polypeptide, the 
stability of the polypeptide, and the yield of expressed polypeptide. 

16. The method of claim 2, wherein the error rate in the altering step of at least one round 
of altering, transforming and screening is about 1-2 mutations per polynucleotide, and 
the error rate in the altering step of at least one other round is about 4-6 mutations per 
polynucleotide. 

17. The method of claim 2, wherein the polymerase chain reaction employs manganese ions 
in a concentration of about 0.35 mM. 

18. The method of claim 1 , wherein screening comprises pre-screening for mutant colonies 
using nitrocellulose membranes. 

19. A polynucleotide evolved according to the method of claim 8 

20. A polynucleotide encoding for a galactose oxidase which has a mutation in at least one 
amino acid selected from the group consisting of A3, SI 0, M70, P136, G195, T218, 
L312, N413, V494, C515, N535, N537, S550, and S610. 

21. A polynucleotide encoding for a galactose oxidase which has at least one amino acid 
mutation selected from the group consisting of SI OP, M70V, G195E, N413D, V494A, 
C515S, N535D, and N537D. 

22. A polynucleotide encoding for a galactose oxidase which has the amino acid mutation 
N537D. 
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23. A polynucleotide encoding for a galactose oxidase which has the amino acid mutation 
V494A. 

24 . The polynucleotide of claim 23 , further comprising the amino acid mutation C5 1 5 S . 

25 . The polynucleotide of claim 23, further comprising the amino acid mutation S10P. 

26. The polynucleotide of claim 23, further comprising a silent mutation at P136. 

27. The polynucleotide of claim 25, further comprising a silent mutation at P 136. 

28. The polynucleotide of claim 23, further comprising the amino acid mutation G195E. 

29. The polynucleotide of claim 28, further comprising a silent mutation in at least one of A3 
andP136. 

30. The polynucleotide of claim 23, further comprising the amino acid mutation N535D. 

31. The polynucleotide of claim 30, further comprising a silent mutation in at least one of 
P136,L312, and T218. 

32. The polynucleotide of claim 23, further comprising the amino acid mutation M70V. 

33. The polynucleotide of claim 32, further comprising a silent mutation at P136. 

34. A polynucleotide encoding for a galactose oxidase which has the amino acid mutations 
V494A, S10P, M70V, G195E and N535D. 

35. The polynucleotide of claim 34, further comprising a silent mutation at P136. 
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36. A polynucleotide encoding for a galactose oxidase which has the amino acid mutation 
N413D. 

37. The polynucleotide of claim 36, further comprising a silent mutation at S550. 

38. The polynucleotide of claim 23, further comprising the amino acid mutation N413D. 

39. The polynucleotide of claim 38, further comprising a silent mutation in at least one of 
S550and S610. 

40. A polynucleotide encoding for a galactose oxidase which has a nucleotide mutation in 
at least one of positions 9, 28, 208, 408, 584, 654, 830, 936, 1237, 1481, 1543, 1603, 
1609, 1650, and 1830. 

41 . The polynucleotide of claim 40, wherein the mutation at any of positions 9, 408, 654, 
936, 1650 and 1830 is a silent mutation. 

42 The polynucleotide of claim 40 which has a mutation in at least one of nucleotide 
positions 28, 408, 654, and 1481, wherein a thymine is replaced by a cytosine. 

43. The polynucleotide of claim 40, which has a mutation in at least one of nucleotide 
positions 1543, 1650 and 1830, wherein a thymine is replaced by an adenine. 

44. The polynucleotide of claim 40, which has a mutation in at least one of nucleotide 
positions 206, 936, 1237, 1603, and 1609, wherein adenine is replaced by guanine. 

45. A polynucleotide encoding for a galactose oxidase which has at least one nucleotide 
mutation in a region encompassed by nucleotides selected from the group consisting of: 
(a) 1 through 30; 
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(b) 200 through 700; 

(c) 800 through 1000; and 

(d) 1200 through 1650. 

46. The polynucleotide of claim 45, which has a nucleotide mutation in a region 
encompassed by nucleotides 1-30, wherein a thymine is replaced by a cytosine. 

47. The polynucleotide of claim 45, which has a nucleotide mutation in a region 
encompassed by nucleotides 1450-1550, wherein a thymine is replaced by one of a 
cytosine and an adenine. 

48. The polynucleotide of claim 45, which has a nucleotide mutation in a region 
encompassed by nucleotides 1200-1250, wherein an adenine is replaced by a guanine. 

49. The polynucleotide of claim 45, which has a nucleotide mutation in a region 
encompassed by nucleotides 1600-1650, wherein an adenine is replaced by a guanine. 

50. The polynucleotide of claim 45, which has a nucleotide mutation in a region proximate 
to and encompassing nucleotide 208, wherein an adenine is replaced by a guanine. 

5 1 . The polynucleotide of claim 45, which has a nucleotide mutation in a region proximate 
to and encompassing nucleotide 585, wherein a guanine is replaced by an adenine. 

52. The polynucleotide of claim 45, which has a nucleotide mutation in a region proximate 
to and encompassing nucleotide 1543, wherein a thymine is replaced by an adenine. 

53. A polynucleotide encoding for a galactose oxidase which has at least one of the 
nucleotide mutations A9C, T28C, A208G, T408C, G584A, T654C, A936G, A1237G, 
T1481C, T1543A, A1603G, A1609G, T1650A, and T1830A. 
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54. The polynucleotide of claim 53 , which has the nucleotide mutation T 148 1 C. 

55. The polynucleotide of claim 54, further comprising the nucleotide mutation T1543A. 

56. The polynucleotide of claim 54, further comprising the nucleotide mutation T408C. 

57. The polynucleotide of claim 56, further comprising a nucleotide mutation selected from 
the group consisting of G584A, A1603G, and A208G. 

58. The polynucleotide of claim 56, further comprising at least one of the nucleotide 
mutations A9C, A936G, and T654C. 

59. The polynucleotide of claim 56, further comprising the nucleotide mutations T28C, 
A208G, G584A and A1603G. 

60. The polynucleotide of claim 53 which has the nucleotide mutation A1237G. 

61. The polynucleotide of claim 60, further comprising at least one of the nucleotide 
mutations selected from the group consisting of T1650A, T1830A, and T1481C. 

62. The polynucleotide of claim 61, having the nucleotide mutations A1237G, T1650A, 
T1481CandT1830A. 

63. A galactose oxidase which has a mutation in at least one amino acid selected from the 
group consisting of A3, S10, M70, P136, G195, T218, L3 12, N413, V494, C515, N535, 
N537, S550, and S610. 



64. 



A galactose oxidase which has at least one of the amino acid mutations SI OP, M70V, 
G195E, N413D, V494A, C515S, N535D, andN537D. 
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65 . The galactose oxidase of claim 64, which has the amino acid mutation N537D. 

66. The galactose oxidase of claim 64, which has the amino acid mutation V494A. 

67. The galactose oxidase of claim 66, further comprising the amino acid mutation C515S. 

68. The galactose oxidase claim 66, further comprising the amino acid mutation SI OP. 

69. The galactose oxidase of claim 66, further comprising a silent mutation at P136. 

70. The galactose oxidase of claim 68, further comprising a silent mutation at P136. 

71 . The galactose oxidase of claim 66, further comprising the amino acid mutation G195E. 

72. The galactose oxidase of claim 71, further comprising a silent mutation in at least one of 
A3 andP136. 

73 . The galactose oxidase of claim 66, further comprising the amino acid mutation N53 5D. 

74. The galactose oxidase of claim 73, further comprising a silent mutation in at least one of 
P136, L312, and T218. 

75. The galactose oxidase of claim 66, further comprising the amino acid mutation M70V. 

76. The galactose oxidase of claim 75, further comprising a silent mutation at P 1 36. 

77. The galactose oxidase of claim 64, which has the amino acid mutations SI OP, M70V, 
G195E, V494A and N535D. 
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78. The galactose oxidase of claim 77, further comprising a silent mutation at P136. 

79. The galactose oxidase of claim 64, which has the amino acid mutation N413D. 

80. The galactose oxidase of claim 80, further comprising a silent mutation at S550. 

81 . The galactose oxidase of claim 66, further comprising the amino acid mutation N413D. 

82. The galactose oxidase of claim 8 1 , further comprising a silent mutation in at least one of 
S550 and S610. 
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ABSTRACT OF THE DISCLOSURE 
This invention relates to the expression of improved polynucleotide and polypeptide 
sequences encoding for eukaryotic enzymes, particularly oxidase enzymes. The enzymes are 
advantagoeusly produced in conventional or facile expression systems. Various methods for 
directed evolution of polynucleotide sequences can be used to obtain the improved sequences. 
The improved characteristics of the polypeptides or proteins generated in this manner include 
improved expression, enhanced activity toward one or more substrates, and increased thermal 
stability. In a particular embodiment, the invention relates to improved expression of the 
galactose oxidase gene and galactose oxidase enzymes. GAO mutants that are highly active 
and/or thermostable are disclosed. 
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PCR primers Sequence 



MY 










001 


5'-AAT TCG AAG CTT ATG GCC TCA GCA CCT ATC GGA AGC-3' 


SEQ. 


ID NO 


1 


002 


5'-CTT CCT TCT AGA TTA CTG AGT AAC GCG AAT CGT-3' 


SEQ 


ID NO 


2 


ocn 


5'-GGA AGA GAA TTC AAT ACG CAA ACC GCC TCT-3' 


SEQ, 


ID NO 


-> 
> 


004 


5'-GGT CAT AAG CTT TTC CTG TGT GAA ATT GTT AT-3* 


SEQ, 


ID NO 


4 


005 


5'-ACC ATG ATT TCG ACG TCG GTA CCC TCA GCA-3' 


SEQ. 


ID NO. 


5 


009 


5'-CTT CCT AAG CTT TCA CTG AGT AAC GCG AAT-3' 


SEQ. 


ID NO 


6 


036 


5'-GGA AGA GGT ACC AAT ACG CAA ACC GCC TCT-3" 


SEQ. 


ID NO. 


7 
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Eco Ri 0) 
Sad CO 
Kpnl (13) 
5amHI (22) 
Xbal (2S) 
ftil (40) 
tfrfidHl (55) 




Viae lacZ 



pUC18 

2686 bp 



cut (////* dill) 

blunting (T4 DN A polymerase) 



ligation (T4 DNA iigase) 



£a?Rl (1) 
&xcl (?) 
Xpnl 03) 
BamHl (22) 
AiwI (2&) 
ftd (40) 




cut (Eco RI) 

blunting (T4 DNA polymerase) 
ligation (T4 DNA ligase) 



Sad (11) 
Kprtl (1?) 
BamHl (26) 
Xbal (32) 
" ' (44) 



/to dill 
. AAGCTT. . , 
. TTCGAA . * . 

. A AGCTT. . 

. TTCGA A,, 

. AAGCT AGCTT* 
, TTCGA TCGAA. 




cut (PstT) 

{ 

blunting (T4 DNA polymerase) 



, AAGCT AGCTT . . . (SEQ ID NO. 22) 

■ ttcgatcgaa. , , ligation <T4 DNA ligase) 

(SEQ ID NO.. 23) 



. gaattc . . ♦ 

.CTTAAG. . . 

.G AATTC ♦ , . 

. CTTAA G... 

.GAATT AATTC . 
.CTTAA TTAAG. . . 

. GAATTAATTC . . . (SEQ. ID NO.: 24) 
• CTTAATTAAG * . . (SEQ ID NO . 25) 




Pst\ 
. CTGCAG . . . 
. GACGTC . . . 

. CTGCA G 
. G ACGTC 

.C G . . . 

. G C. . . 

♦CG. . . 
.GC. . . 



P/ac (lacZ) 

pUC18-EHL 
2694 bp 

4» 




P-MY001 AAT TCG AAG CTT ATg TCA GCA CCT ATC GGA 

P-MY002 CTT CCT TCT AG A TTA CTG AGT AAC GCG AAT CGT 

P-MY003 GGA AGA GAA TTC AAT ACG CAA ACC GCC TCT 

P-MYO04 GGT CAT AAG CTT TTC CTG TGT GAA ATT GTT AT 

EcoRl CO 
/tfwdlH (55) 

J -MY004 

P-MY003 



^ Hindi]} (2536) 

PCR (Primer : P-MY001, P-MY002) 
cut (Hin&lll, Xbal) 





PCR (Primer : P-MY003, P-MY004) 



cut (EcoRl HindUl) 



cut (Hintlll, Xbal) cuX (EcoRl Hindlll) L{>J 



J 



?tac 



ligation (T4 DNA ligase) 



ligation (T4 DNA ligase) 



£wRI (540) 




fiwRI (1) 



HtndUl <2\0) 





cut (ffw dlll, A^a I) cut (EcoRl, Hindlll) cut (Eco R I, Afra Q 



tfindlU (1957) 



t 



ligation (T4 DNA ligase) 



£eoRI (1) 



Hindlll (210) 
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£coRl (1600) 



Xbal (Z139) 




ligation (T4 DNA ligase) 




Xba 1(1921) 



•Sad (7) 

P-MY005 -V 



P-MY0 0 5 ACC ATG ATT TCG AGC TCG GTA CCC TCA GCA 
P-MY009 CTT CCT AAG CTT TCA CTG AGT AAC GCG AAT 



Viae lad 



gao 



pR3 

5486 bp 



' Htn6\\\ 

P-MY009 



PCR (Primer : P-MY005, P-MY009) 



cut (Sad, Hindlll) 

Sad 4? JiinilU 



U 



lacZ gao 




ligation (T4 DNA ligasc) 

t 

Sad (7) 



?lac lacZ 



pGAO-006 

4887 bp 




Sad (T) 




cut (Sad, Hindlll) 



HiAtlU (2536) 



cut (Sacl, Hindlll) cut (Sacl, Hindlll) 

i i 




T 



Hindlll (1937) 



ligation £T4 DNA ligase) 

* 

Sacl (7) 
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GAOacti vuies [units/ml-cukure] 



Plasmid (vector) Host strain DH5aMCR BL21(DE3) KY-I4478 



Induction - IPTG - IPTG IPTG 



Pt<tr OVr icrZ 



pR3 

pGAOO03 

pGAO-004 

pGAOOOS 

pGAO-006 

pGAO-007 

pGAO-008 

pGAO-009 

pGAO-010 

pG AO-Oil 

pGAO-014 

pGAO-015 

pGAOOI6 

pGAO-017 

pGAO-018 

pGAO-019 

pGAO-020 

pGAO-021 

pGAO-022 

pGACK)23 

pGAO024 



(pUCI18) 

(pET22b(+)) 

(pET22b(+)) 

(pET22b(+)) 

(pUC118) 

(pET22b(+)) 

(pET22b(+)) 

(pET22b(+)) 

(pUC18) 

(pUC18) 

(pUC 18) 

(pUCi8) 

(pUC18) 

(pUC18) 



t>o rn^> o 

P77 ty<r petS krZ T 
PT7 Ofct (v« IcZ T 

iwdie^ii 

m Qt« frtB TT 
Mr <*«: Pf«- ««■ 

R«c a<r /,rZ 

i>o rn^ ) 



tt«r Vttr fv(B 



W*r ate pdB 



ftr OVr f¥(r Vhr f*(8 

lXX>OE} 

Ffcr Oftr ftVr <fcr ftfJ 

>a>oi^O 

(pUCIS) |XX>0[I>Ol_222_) 

We «tc H«r Of«r FT7 Of**- piR 

( P uci8) (xx>ai>oi^i) 

(pucis) [XX>ODfOi 

H«r Otar W«- Uttr plB 1 

(PUCI8) t>G>0 iOEl) I 

Ifir Ofcr Iftr <)f<r ^fl 

(pucis) cxx>oi^Oi 

If a llfrr H,r Oftr W7 «<r jvfB 

(pucis) ixx>a>oi^i)i 

W<r (*«• ««■ (Jfcr 177 <*<r pHR 

(pucb) lX)!XJ>Oi!iil)! 



0.01 
0 
0 
0 

L22 1.72 



0.02 0 
0 0 



0 



0 
0 



0.04 0.04 

0 0.01 

0 0 

0.19 0.15 



0 



0 0.01 
0.02 0 
0.03 0.08 



0.01 
0 
0 
0 

0.08 
0.05 
0.03 
0.02 
0.67 
0.01 



003 
0 
0 
0 

1.35 
0 

0.01 
0.03 
1.43 
0.85 



0.03 

0.06 
_ 

_ ** 
0.97 
0.03 
0.12 



0.04 
0.47 



2.21 
0.31 
0.93 



0.31 
0 
0 
0 

0.87 
0 
0 
0 

0.40 

0.41 

0 

0 

0 

0.31 



0.22 
0.24 
0.14 



FIG. 12 



promoter 
(operator) 



N-terminal 
addition 



gao 



C-termim 
addition 



terminator 



(alternation of codons) 



Plac Olac 
?T7 



lacZ N-terminal 
pelB leader j 



His tag 



on 



0-m 




T77 jO«(pUC) 
JOn(pBR) 



FIG. 13 



Plasmid 



GAO activity [units/ml] (+ IPTG) 
BL21(DE3) KY-14478 



pGAO-011 

(pUC18) 
pGAO-025 

(pUC18) 
pGAO-010 

(pUC18) 
pGAO-027 

(pUC18) 
pGAO-028 

( P UC18) 



Pbc Obc loci * 

te<Q> t I mo ) 



Piac Otic Viae Obc LacZ 



nut, v^cil rwi vauc uw-c » 

iKo)[K3) n~g^~) 



The Obc ttac Ofac 



nac uuc nac vwie t 
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Substrate (100 mM) 



Rearative activities of galactose oxidase {%] 



D. dendroides 
(Sigma) 



E. coli 

BL21(DE3)/pGAO-010 



D- Galactose 

D-Glucose 

D-Sucrose 

a-D-Lactose 

P-D-Lactose 
D-Raffinose 
D-Melibiose 

Benzyl alcohol (25 % Methanol) 
2-HydroxybenzyI alcohol 

2- Pyridylcarbinol 

3- Pyridylcarbinol 

4- Pyridylcarbinol 



i) 



Cyclohexylmethnol (45 % Methanol) 
Tetrahydropyran-2-methanol 2) 
Cyclopentamethanol (30 % Methanol) 3) 
Tetrahydrofurfuryl alcohol 4) 

Glycerol 

Ethylene glycol 

1-Propanol 

1,2-Propanedilol 

Acetol 

Allyl alcohol 



100 
0 
0 

20 

42 
114 
75 

15 

(+) 
14 

50 

32 

1.9 
0 

0.42 
n.d. 

4.1 

0.45 

0 

(+) 
13 

4.6 



100 
0 
0 

17 

32 
110 
75 

11 

(+) 
15 

46 

29 

2.1 
0 

0.25 
n.d. 

3.4 

0.16 

0 

(+) 
13 

3.6 



1) 




2) 



CH 2 OH 




CH 2 OH 



3) 




4) 



CH 2 OH 



O 



CH 2 OH 



Date : 2000.04.10 

Mutant ID ; 9.16.8D2 

Mutation : N5 37 D ( Al 609G) 

Sequence Size : 1917 



10 20 30 40 50 60 

GCC TCA GCA CCT ATC GGA AGC GCC ATT TCT CGC AAC AAC TGG GCC GTC ACT TGC GAC AGT 
ASAPIGSAISRNNWAVTCDS 

70 80 90 100 110 120 

GCA CAG TCG GGA AAT GAA TGC AAC AAG GCC ATT GAT GGC AAC AAG GAT ACC TTT TGG CAC 
AQSGNECNKAI DGN KDT FWH 

130 140 150 160 170 180 

ACA TTC TAT GGC GCC AAC GGG GAT CCA AAG CCC CCT CAC ACA TAC ACG ATT GAC ATG AAG 
TFYGANGDPKPPHTYTI DMK 

190 200 210 220 230 240 

ACA ACT CAG AAC GTC AAC GGC TTG TCT ATG CTG CCT CGA CAG GAT GGT AAC CAA AAC GGC 
TTQNVNGLSMLPRQDGNQNG 

250 260 270 280 290 300 

TGG ATC GGT CGC CAT GAG GTT TAT CTA AGC TCA GAT GGC ACA AAC TGG GGC AGC CCT GTT 
WIGRHEVYLSSDGTNWGSPV 

310 320 330 340 350 360 

GCG TCA GGT AGT TGG TTC GCC GAC TCT ACT ACA AAA TAC TCC AAC TTT GAA ACT CGC CCT 
ASGSWFADSTTKYSNFETRP 

370 380 390 400 410 420 

GCT CGC TAT GTT CGT CTT GTC GCT ATC ACT GAA GCG AAT GGC CAG CCT TGG ACT AGC ATT 
ARYVRLVAITEANGQPWTSI 

430 440 450 460 470 480 

GCA GAG ATC AAC GTC TTC CAA GCT AGT TCT TAC ACA GCC CCC CAG CCT GGT CTT GGA CGC 
AEINVFQASSYTAPQPGLGR 

490 500 510 520 530 540 

TGG GGT CCG ACT ATT GAC TTA CCG ATT GTT CCT GCG GCT GCA GCA ATT GAA CCG ACA TCG 
WGPTIDLPIVPAAAAIEPTS 

550 560 570 580 590 600 

GGA CGA GTC CTT ATG TGG TCT TCA TAT CGC AAT GAT GCA TTT GGA GGA TCC CCT GGT GGT 
GRVLMWSSYRNDAFGGSPGG 

610 620 630 640 650 660 

ATC ACT TTG ACG TCT TCC T(?G GAT CCA TCC ACT GGT ATT GTT TCC GAC CGC ACT GTG ACA 
ITLTSSWDPSTGIVSDRTVT 

670 680 690 700 710 720 

GTC ACC AAG CAT GAT ATG TTC TGC CCT GGT ATC TCC ATG GAT GGT AAC GGT CAG ATC GTA 
VTKHDMFCPGI SMDGNGQIV 

730 740 750 760 770 780 

GTC ACA GGT GGC AAC GAT GCC AAG AAG ACC AGT TTG TAT GAT TCA TCT AGC GAT AGC TGG 
VTGGNDAKKTSLYDSSSDSW 

790 800 810 820 830 840 

ATC CCG GGA CCT GAC ATG CAA GTG GCT CGT GGG TAT CAG TCA TCA GCT ACC ATG TCA GAC 
IPGPDMQVARGYQSSATMSD 
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850 860 870 880 890 900 

GGT CGT GTT TTT ACC ATT GGA GGC TCC TGG AGC GGT GGC GTA TTT GAG AAG AAT GGC GAA 
GRVFTIGGSWSGGVFE'KNGE 



910 920 930 940 950 960 

GTC TAT AGC CCA TCT TCA AAG ACA TGG ACG TCC CTA CCC AAT GCC AAG GTC AAC CCA ATG 
VYSPSSKTWTSLPNAKVNPM 

970 980 990 1000 1010 1020 

TTG ACG GCT GAC AAG CAA GGA TTG TAC CGT TCA GAC AAC CAC GCG TGG CTC TTT GGA TGG 
LTADKQGLYRS DNHAWLFGW 

1030 1040 1050 1060 1070 1080 

AAG AAG GGT TCG GTG TTC CAA GCG GGA CCT AGC ACA GCC ATG AAC TGG TAC TAT ACC AGT 
KKGSVFQAGPSTAMNWYYTS 

1090 1100 1110 1120 1130 1140 

GGA AGT GGT GAT GTG AAG TCA GCC GGA AAA CGC CAG TCT AAC CGT GGT GTA GCC CCT GAT 
GSGDVKSAGKRQSNRGVAPD 

1150 1160 1170 1180 1190 1200 

GCC ATG TGC GGA AAC GCT GTC ATG TAC GAC GCC GTT AAA GGA AAG ATC CTG ACC TTT GGC 
AMCGNAVMYDAVKGKILTFG 

1210 1220 1230 1240 1250 1260 

GGC TCC CCA GAT TAT CAA GAC TCT GAC GCC ACA ACC AAC GCC CAC ATC ATC ACC CTC GGT 
GSPDYQDSDATTNAHIITLG 

1270 1280 1290 1300 1310 1320 

GAA CCC GGA ACA TCT CCC AAC ACT GTC TTT GCT AGC AAT GGG TTG TAC TTT GCC CGA ACG 
EPGTS PNTVFASNGLYFART 

1330 1340 1350 1360 1370 1380 

TTT CAC ACC TCT GTT GTT CTT CCA GAC GGA AGC ACG TTT ATT ACA GGA GGC CAA CGA CGT 
FHTSVVLPDGSTFITGGQRR 

1390 1400 1410 1420 1430 1440 

GGA ATT CCG TTC GAG GAT TCA ACC CCG GTA TTT ACA CCT GAG ATC TAC GTC CCT GAA CAA 
GIPFEDSTPVFTPEIYVPEQ 

1450 1460 1470 1480 1490 1500 

GAC ACT TTC TAC AAG CAG AAC CCC AAC TCC ATT GTT CGC GTC TAC CAT AGC ATT TCC CTT 
DTFYKQN PNSIVRVYHSISL 

1510 1520 1530 1540 1550 1560 

TTG TTA CCT GAT GGC AGG GTA TTT AAC GGT GGT GGT GGT CTT TGT GGC GAT TGT ACC ACG 
LLPDGRVFNGGGGLCGDCTT 

1570 1580 1590 1600 1610 1620 

AAT CAT TTC GAC GCG CAA ATC TTT ACG CCA AAC TAT CTT TAC AAT AGC GAC GGC AAT CTC 
KHFDAQI FT PNYLYNSDGNL 

1630 1640 1650 1660 1670 1680 

GCG ACA CGT CCC AAG ATT ACC AGA ACC TCT ACA CAG AGC GTC AAG GTC GGT GGC AGA ATT 
ATRPKITRTSTQSVKVGGRI 

1690 1700 1710 1720 1730 1740 

ACA ATC TCG ACG GAT TCT TCG ATT AGC AAG GCG TCG TTG ATT CGC TAT GGT ACA GCG ACA 
TISTDSSISKASLIRYGTAT 



1750 1760 1770 

CAC ACG GTT AAT ACT GAC CAG CGC CGC ATT 
HTVNTDQRRI 

1810 1820 1830 

AGC TAT TCT TTC CAA GTT CCT AGC GAC TCT 
SYS FQVPSDS 

1870 1880 1890 

TTC GTG ATG AAC TCG GCC GGT GTT CCT AGT 
FVMNSAGVPS 



1780 1790 , 1800 

CCC CTG ACT CTG ACA AAC AAT GGA GGA AAT 
PLTLTNNGGN 

1840 1850 1860 

GGT GTT GCT TTG CCT GGC TAC TGG ATG TTG 
GVALPGYWML 

1900 1910 1920 

GTG GCT TCG ACG ATT CGC GTT ACT CAG 
VASTIRVTQ 
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Date : 2000.04.10 

Mutant ID : 9.16.6C11 

Mutation : V494A(T1481C) , CS15S (T1S43A) 

Sequence Size : 1917 



10 20 30 40 50 60 

GCC TCA GCA CCT ATC GGA AGC GCC ATT TCT CGC AAC AAC TGG GCC GTC ACT TGC GAC AGT 
ASAPIGSAISRNNWAVTCDS 

70 80 90 100 110 120 

GCA CAG TCG GGA AAT GAA TGC AAC AAG GCC ATT GAT GGC AAC AAG GAT ACC TTT TGG CAC 
AQSGNECNKAI DGNKDT FWH 

130 140 150 160 170 180 

AC A TTC TAT GGC GCC AAC GGG GAT CCA AAG CCC CCT CAC ACA TAC ACG ATT GAC ATG AAG 
T FYGANGDPKPPHTYTI DMK 

190 200 210 220 230 240 

ACA ACT CAG AAC GTC AAC GGC TTG TCT ATG CTG CCT CGA CAG GAT GGT AAC CAA AAC GGC 
TTQNVNGLSMLPRQDGNQNG 

250 260 270 280 290 300 

TGG ATC GGT CGC CAT GAG GTT TAT CTA AGC TCA GAT GGC ACA AAC TGG GGC AGC CCT GTT 
VJIGRHEVYLSSDGTNWGSPV 

310 320 330 340 350 360 

GCG TCA GGT AGT TGG TTC GCC GAC TCT ACT ACA AAA TAC TCC AAC TTT GAA ACT CGC CCT 
ASGSWFADSTTKYSNFETRP 

370 380 390 400 410 420 

GCT CGC TAT GTT CGT CTT GTC GCT ATC ACT GAA GCG AAT GGC CAG CCT TGG ACT AGC ATT 
ARYVRLVAI TEANGQPWTS I 

430 440 450 460 470 480 

GCA GAG ATC AAC GTC TTC CAA GCT AGT TCT TAC ACA GCC CCC CAG CCT GGT CTT GGA CGC 
AEINVFQASSYTAPQPGLGR 

490 500 510 520 530 540 

TGG GGT CCG ACT ATT GAC TTA CCG ATT GTT CCT GCG GCT GCA GCA ATT GAA CCG ACA TCG 
WGPTIDLPIVPAAAAIEPTS 

550 560 570 580 590 600 

GGA CGA GTC CTT ATG TGG TCT TCA TAT CGC AAT GAT GCA TTT GGA GGA TCC CCT GGT GGT 
GRVLMWSSYRNDAFGGSPGG 

610 620 630 640 650 660 

ATC ACT TTG ACG TCT TCC TGG GAT CCA TCC ACT GGT ATT GTT TCC GAC CGC ACT GTG ACA 
ITLTSSWDPSTGIVSDRTVT 

670 680 690 700 710 720 

GTC ACC AAG CAT GAT ATG TTC TGC CCT GGT ATC TCC ATG GAT GGT AAC GGT CAG ATC GTA 
VTKHDMFCPGI SMDGNGQIV 

730 740 750 760 770 780 

GTC ACA GGT GGC AAC GAT GCC AAG AAG ACC AGT TTG TAT GAT TCA TCT AGC GAT AGC TGG 
VTGGNDAKKTSLYDSSSDSW 

790 800 810 820 830 840 

ATC CCG GGA CCT GAC ATG CAA GTG GCT CGT GGG TAT CAG TCA TCA GCT ACC ATG TCA GAC 
I PGPDMQVARGYQSSATMSD 



FIG. 18A 



FIG. 18B 



850 860 870 880 890 900 

GGT CGT GTT TTT ACC ATT GGA GGC TCC TGG AGC GGT GGC GTA TTT GAG AAG AAT GGC GAA 
GRVFT IGGSWSGGVFEKNGE 

910 920 930 940 950 960 

GTC TAT AGC CCA TCT TCA AAG ACA TGG ACG TCC CTA CCC AAT GCC AAG GTC AAC CCA ATG 
VYSPSSKTWTSLPNAKVNPM 

970 980 990 1000 1010 1020 

TTG ACG GCT GAC AAG CAA GGA TTG TAC CGT TCA GAC AAC CAC GCG TGG CTC TTT GGA TGG 
LTADKQGLYRSDNHAWLFGW 

1030 1040 1050 1060 1070 1080 

AAG AAG GGT TCG GTG TTC CAA GCG GGA CCT AGC ACA GCC ATG AAC TGG TAC TAT ACC AGT 
KKGSVFQAGPSTAMNWYYTS 

1090 1100 1110 1120 1130 1140 

GGA AGT GGT GAT GTG AAG TCA GCC GGA AAA CGC CAG TCT AAC CGT GGT GTA GCC CCT GAT 
GSGDVKSAGKRQSNRGVAPD 

1150 1160 1170 1180 1190 1200 

GCC ATG TGC GGA AAC GCT GTC ATG TAC GAC GCC GTT AAA GGA AAG ATC CTG ACC TTT GGC 
AMCGNAVMYDAVKGKILTFG 

1210 1220 1230 1240 1250 1260 

GGC TCC CCA GAT TAT CAA GAC TCT GAC GCC ACA ACC AAC GCC CAC ATC ATC ACC CTC GGT 
GSPDYQDSDATTNAHI ITLG 

1270 1280 1290 1300 1310 1320 

GAA CCC GGA ACA TCT CCC AAC ACT GTC TTT GCT AGC AAT GGG TTG TAC TTT GCC CGA ACG 
EPGTS PNTVFASNGLY FART 

1330 1340 1350 1360 1370 1380 

TTT CAC ACC TCT GTT GTT CTT CCA GAC GGA AGC ACG TTT ATT ACA GGA GGC CAA CGA CGT 
FHTSVVLPDGST FI TGGQRR 

1390 1400 1410 1420 1430 1440 

GGA ATT CCG TTC GAG GAT TCA ACC CCG GTA TTT ACA CCT GAG ATC TAC GTC CCT GAA CAA 
GIPFEDSTPVFTPEIYVPEQ 

1450 1460 1470 1480 1490 1500 

GAC ACT TTC TAC AAG CAG AAC CCC AAC TCC ATT GTT CGC GCC TAC CAT AGC ATT TCC CTT 
DTFYKQNPNS IVRAYHS ISL 

1510 1520 1530 1540 1550 1560 

TTG TTA CCT GAT GGC AGG GTA TTT AAC GGT GGT GGT GGT CTT AGT GGC GAT TGT ACC ACG 
LLPDGRVFNGGGGLSGDCTT 

1570 1580 1590 1600 1610 1620 

AAT CAT TTC GAC GCG CAA ATC TTT ACG CCA AAC TAT CTT TAC AAT AGC AAC GGC AAT CTC 
NHFDAQI FTPNYLYNSNGNL 

1630 1640 1650 1660 1670 1680 

GCG ACA CGT CCC AAG ATT ACC AGA ACC TCT ACA CAG AGC GTC AAG GTC GGT GGC AGA ATT 
ATRPKITRTSTQSVKVGGRI 

1690 1700 1710 1720 1730 1740 

ACA ATC TCG ACG GAT TCT TCG ATT AGC AAG GCG TCG TTG ATT CGC TAT GGT ACA GCG ACA 
TISTDSSISKASLI RYGTAT 

1750 1760 1770 1780 1790 1800 

CAC ACG GTT AAT ACT GAC CAG CGC CGC ATT CCC CTG ACT CTG ACA AAC AAT GGA GGA AAT 
HTV.NT DQRRI PLTLTNNGGN 



1810 1820 1830 

AGC TAT TCT TTC CAA GTT CCT AGC GAC TCT 
SYSFQVPSDS 

1870 1880 1890 

TTC GTG ATG AAC TCG GCC GGT GTT CCT AGT 
FVMNSAGVPS 



1840 1850 1860 

GGT GTT GCT TTG CCT GGC TAC TGG ATG TTG 
GVALPGYWML 

1900 1910 1920 

GTG GCT TCG ACG ATT CGC GTT ACT CAG 
VASTIRVTQ 



FIG. 18C 



Date 

Mutant ID 
Mutation 
Sequence Size 



2000.04. 10 
: 9.16.16D12 

: P136(T408C), 
1917 



FIG. 19A 



V494A(T1481C) 



10 20 30 

GCC TCA GCA CCT ATC GGA AGC GCC ATT TCT 
ASAPIGSAIS 

70 80 90 

GCA CAG TCG GGA AAT GAA TGC AAC AAG GCC 
AQSGNECNKA 

130 140 ISO 

ACA TTC TAT GGC GCC AAC GGG GAT CCA AAG 
TFYGANGDPK 

190 200 210 

ACA ACT CAG AAC GTC AAC GGC TTG TCT ATG 
TTQNVNGLSM 

250 260 270 

TGG ATC GGT CGC CAT GAG GTT TAT CTA AGC 
WIGRHEVYLS 



40 50 60 

CGC AAC AAC TGG GCC GTC ACT TGC GAC AGT 
RNNWAVTCDS 

100 110 120 

ATT GAT GGC AAC AAG GAT ACC TTT TGG CAC 
I DGNKDTFWH 

160 170 180 

CCC CCT CAC ACA TAC ACG ATT GAC ATG AAG 
PPHTYTIDMK 

220 230 240 

CTG CCT CGA CAG GAT GGT AAC CAA AAC GGC 
LPRQDGNQNG 

280 290 300 

TCA GAT GGC ACA AAC TGG GGC AGC CCT GTT 
SDGTNWGSPV 



310 320 330 

GCG TCA GGT AGT TGG TTC GCC GAC TCT ACT 
ASGSV7FADST 



340 350 360 

ACA AAA TAC TCC AAC TTT GAA ACT CGC CCT 
TKYSNFETRP 



370 380 390 

GCT CGC TAT GTT CGT CTT GTC GCT ATC ACT 
ARYVRLVAIT 

430 440 450 

GCA GAG ATC AAC GTC TTC CAA GCT AGT TCT 
AEINVFQASS 

490 500 510 

TGG GGT CCG ACT ATT GAC TTA CCG ATT GTT 
WGPTIDLPIV 



400 410 420 

GAA GCG AAT GGC CAG CCC TGG ACT AGC ATT 
EAHGQPWTSI 

460 470 480 

TAC ACA GCC CCC CAG CCT GGT CTT GGA CGC 
YTAPQPGLGR 

520 530 540 

CCT GCG GCT GCA GCA ATT GAA CCG ACA TCG 
PAAAAIEPTS 



550 560 570 

GGA CGA GTC CTT ATG TGG TCT TCA TAT CGC 
GRVLMWSSYR 

610 620 630 

ATC ACT TTG ACG TCT TCC TGG GAT CCA TCC 
ITLTSSWDPS 



580 590 600 

AAT GAT GCA TTT GGA GGA TCC CCT GGT GGT 
NDAFGGSPGG 

640 650 660 

ACT GGT ATT GTT TCC GAC CGC ACT GTG ACA 
TGIVSDRTVT 



670 680 690 700 710 720 

GTC ACC AAG CAT GAT ATG TTC TGC CCT GGT ATC TCC ATG GAT GGT AAC GGT CAG ATC GTA 
VTKHDMFCPG I SMDGNGQIV 

730 740 750 760 770 780 

GTC ACA GGT GGC AAC GAT GCC AAG AAG ACC AGT TTG TAT GAT TCA TCT AGC GAT AGC TGG 
VTGGNDAKKTSLYDSSSDSW 



790 800 810 820 830 840 

ATC CCG GGA CCT GAC ATG CAA GTG GCT CGT GGG TAT CAG TCA TCA GCT ACC ATG TCA GAC 
I PGPDMQVARGYQSSATMSD 



FIG. 19B 



850 860 870 880 890 900 

GGT CGT GTT TTT ACC ATT GGA GGC TCC TGG AGC GGT GGC GTA TTT GAG AAG AAT GGC GAA 
GRVFTIGGSWSGGVFEKNGE 

910 920 930 940 950 960 

GTC TAT AGC CCA TCT TCA AAG ACA TGG ACG TCC CTA CCC AAT GCC AAG GTC AAC CCA ATG 
VYSPSSKTWTSLPNAKVNPM 

970 980 990 1000 1010 1020 

TTG ACG GCT GAC AAG CAA GGA TTG TAC CGT TCA GAC AAC CAC GCG TGG CTC TTT GGA TGG 
LTADKQGLYRS DN HAWL F G W 

1030 1040 1050 1060 1070 1080 

AAG AAG GGT TCG GTG TTC CAA GCG GGA CCT AGC ACA GCC ATG AAC TGG TAC TAT ACC AGT 
KKGSVFQAG PSTAMNWYYTS 

1090 1100 1110 1120 1130 1140 

GGA AGT GGT GAT GTG AAG TCA GCC GGA AAA CGC CAG TCT AAC CGT GGT GTA GCC CCT GAT 
GSGDVKSAGKRQSNRGVAPD 

1150 1160 1170 1180 1190 1200 

GCC ATG TGC GGA AAC GCT GTC ATG TAC GAC GCC GTT AAA GGA AAG ATC CTG ACC TTT GGC 
AMCGNAVMYDAVKGKI LTFG 

1210 1220 1230 1240 1250 1260 

GGC TCC CCA GAT TAT CAA GAC TCT GAC GCC ACA ACC AAC GCC CAC ATC ATC ACC CTC GGT 
GSPDYQDSDATTNAHI I TLG 

1270 1280 1290 1300 1310 1320 

GAA CCC GGA ACA TCT CCC AAC ACT GTC TTT GCT AGC AAT GGG TTG TAC TTT GCC CGA ACG 
EPGTSPNTVFASNGLY FART 

1330 1340 1350 1360 1370 1380 

TTT CAC ACC TCT GTT GTT CTT CCA GAC GGA AGC ACG TTT ATT ACA GGA GGC CAA CGA CGT 
FHTSVVLPDGST FITGGQRR 

1390 1400 1410 1420 1430 1440 

GGA ATT CCG TTC GAG GAT TCA ACC CCG GTA TTT ACA CCT GAG ATC TAC GTC CCT GAA CAA 
GIPFEDSTPVFTPEIYVPEQ 

1450 1460 1470 1480 1490 1500 

GAC ACT TTC TAC AAG CAG AAC CCC AAC TCC ATT GTT CGC GCC TAC CAT AGC ATT TCC CTT 
DTFYKQNPNSIVRAYHSISL 

1510 1520 1530 1540 1550 1560 

TTG TTA CCT GAT GGC AGG GTA TTT AAC GGT GGT GGT GGT CTT TGT GGC GAT TGT ACC ACG 
LLPDGRVFNGGGGLCGDCTT 

1570 1580 1590 1600 1610 1620 

AAT CAT TTC GAC GCG CAA ATC TTT ACG CCA AAC TAT CTT TAC AAT AGC AAC GGC AAT CTC 
NHFDAQI FT PNYLYNSNGNL 

1630 1640 1650 1660 1670 1680 

GCG ACA CGT CCC AAG ATT ACC AGA ACC TCT ACA CAG AGC GTC AAG GTC GGT GGC AGA ATT 
ATRPKITRTSTQSVKVGGRI 

1690 1700 1710 1720 1730 1740 

ACA ATC TCG ACG GAT TCT TCG ATT AGC AAG GCG TCG TTG ATT CGC TAT GGT ACA GCG ACA 
TISTDSSISKASLIRYGTAT 

1750 1760 1770 1780 1790 1800 

CAC ACG GTT AAT ACT GAC CAG CGC CGC ATT CCC CTG ACT CTG ACA AAC AAT GGA GGA AAT 
HTVNTDQRRI PLTLTNNGGN 



1 



1810 1820 1830 

AGC TAT TCT TTC CAA GTT CCT AGC GAC TCT 
SYSFQVPSDS 

1870 1880 1890 

TTC GTG ATG AAC TCG GCC GGT GTT CCT AGT 
FVMNSAGVPS 



1840 1850 1860 

GGT GTT GCT TTG CCT GGC TAC TGG ATG TTG 
GVALPGYWML 

1900 1910 1920 

GTG GCT TCG ACG ATT CGC GTT ACT CAG 
VASTIRVTQ 



FIG. 19C 



Date 

Mutant ID 
Mutation 
Sequence Size 



2000.04. 13 
. 11.03.6D3 

: S10P(T28C), P136(T408C) 
1917 



FIG. 20A 



V494A(T1481C) 



10 20 30 

GCC TCA GCA CCT ATC GGA AGC GCC ATT CCT 
ASAPIGSAIP 

70 80 90 

GCA CAG TCG GGA AAT GAA TGC AAC AAG GCC 
AQSGNECNKA 

130 140 150 

ACA TTC TAT GGC GCC AAC GGG GAT CCA AAG 
TFYGANGDPK 

190 200 210 

ACA ACT CAG AAC GTC AAC GGC TTG TCT ATG 
TTQNVNGLSM 

250 260 270 

TGG ATC GGT CGC CAT GAG GTT TAT CTA AGC 
WIGRHEVYLS 

310 320 330 

GCG TCA GGT AGT TGG TTC GCC GAC TCT ACT 
ASGSWFADST 

370 380 390 

GCT CGC TAT GTT CGT CTT GTC GCT ATC ACT 
ARYVRLVAIT 

430 440 450 

GCA GAG ATC AAC GTC TTC CAA GCT AGT TCT 
AEINVFQASS 

490 500 510 

TGG GGT CCG ACT ATT GAC TTA CCG ATT GTT 
WGPTIDLPIV 

550 560 570 

GGA CGA GTC CTT ATG TGG TCT TCA TAT CGC 
GRVLMWSSYR 

610 620 630 

ATC ACT TTG ACG TCT TCC TGG GAT CCA TCC 
ITLTSSWDPS 

670 680 690 

GTC ACC AAG CAT GAT ATG TTC TGC CCT GGT 
VTKHDMFCPG 

730 740 750 

GTC ACA GGT GGC AAC GAT GCC AAG AAG ACC 
VTGGNDAKKT 



40 50 60 

CGC AAC AAC TGG GCC GTC ACT TGC GAC AGT 
RNNWAVTCDS 

100 110 120 

ATT GAT GGC AAC AAG GAT ACC TTT TGG CAC 
IDGNKDTFWH 

160 170 180 

CCC CCT CAC ACA TAC ACG ATT GAC ATG AAG 
PPHTYTI DMK 

220 230 240 

CTG CCT CGA CAG GAT GGT AAC CAA AAC GGC 
LPRQDGNQNG 

280 290 300 

TCA GAT GGC ACA AAC TGG GGC AGC CCT GTT 
SDGTNWGSPV 

340 350 360 

ACA AAA TAC TCC AAC TTT GAA ACT CGC CCT 
TKYSNFETRP 

400 410 420 

GAA GCG AAT GGC CAG CCC TGG ACT AGC ATT 
EANGQPVJTS I 

460 470 480 

TAC ACA GCC CCC CAG CCT GGT CTT GGA CGC 
YTAPQPGLGR 

520 530 540 

CCT GCG GCT GCA GCA ATT GAA CCG ACA TCG 
PAAAAI EPTS 

580 590 600 

AAT GAT GCA TTT GGA GGA TCC CCT GGT GGT 
NDAFGGS PGG 

640 650 660 

ACT GGT ATT GTT TCC GAC CGC ACT GTG ACA 
TGIVSDRTVT 

700 710 720 

ATC TCC ATG GAT GGT AAC GGT CAG ATC GTA 
ISMDGNGQIV 

760 770 780 

AGT TTG TAT GAT TCA TCT AGC GAT AGC TGG 
SLYDSSSDSW 



790 800 810 820 830 840 

ATC CCG GGA CCT GAC ATG CAA GTG GCT CGT GGG TAT CAG TCA TCA GCT ACC ATG TCA GAC 
I PGPDMQVARGYQSSATMSD 



FIG. 20B 



850 860 870 880 890 900 

GGT CGT GTT TTT ACC ATT GGA GGC TCC TGG AGC GGT GGC GTA TTT GAG AAG AAT GGC GAA 
GRVFT IGGSWSGGVFEKNGE 

910 920 930 940 950 960 

GTC TAT AGC CCA TCT TCA AAG ACA TGG ACG TCC CTA CCC AAT GCC AAG GTC AAC CCA ATG 
VYS PSSKTWTSLPNAKVNPM 

970 980 990 1000 1010 1020 

TTG ACG GCT GAC AAG CAA GGA TTG TAC CGT TCA GAC AAC CAC GCG TGG CTC TTT GGA TGG 
LTADKQGLYRSDNHAWLFGW 

1030 1040 1050 1060 1070 1080 

AAG AAG GGT TCG GTG TTC CAA GCG GGA CCT AGC ACA GCC ATG AAC TGG TAC TAT ACC AGT 
KKGSVFQAGPSTAMNWYYTS 

1090 1100 1110 1120 1130 1140 

GGA AGT GGT GAT GTG AAG TCA GCC GGA AAA CGC CAG TCT AAC CGT GGT GTA GCC CCT GAT 
GSGDVKSAGKRQSNRGVAPD 

1150 1160 1170 1180 1190 1200 

GCC ATG TGC GGA AAC GCT GTC ATG TAC GAC GCC GTT AAA GGA AAG ATC CTG ACC TTT GGC 
AMCGNAVMYDAVKGKILTFG 

1210 1220 1230 1240 1250 1260 

GGC TCC CCA GAT TAT CAA GAC TCT GAC GCC ACA ACC AAC GCC CAC ATC ATC ACC CTC GGT 
GS PDYQDSDATTNAHI ITLG 

1270 1280 1290 1300 1310 1320 

GAA CCC GGA ACA TCT CCC AAC ACT GTC TTT GCT AGC AAT GGG TTG TAC TTT GCC CGA ACG 
EPGTSPNTVFASNGLYFART 

1330 1340 1350 1360 1370 1380 

TTT CAC ACC TCT GTT GTT CT1 CCA GAC GGA AGC ACG TTT ATT ACA GGA GGC CAA CGA CGT 
FHTSVVLPDGSTFI TGGQRR 

1390 1400 1410 1420 1430 1440 

GGA ATT CCG TTC GAG GAT TCA ACC CCG GTA TTT ACA CCT GAG ATC TAC GTC CCT GAA CAA 
GIPFEDSTPVFTPE1YVPEQ 

1450 1460 1470 1480 1490 1500 

GAC ACT TTC TAC AAG CAG AAC CCC AAC TCC ATT GTT CGC GCC TAC CAT AGC ATT TCC CTT 
DTFYKQNPNSIVRAYHSISL 

1510 1520 1530 1540 1550 1560 

TTG TTA CCT GAT GGC AGG GTA TTT AAC GGT GGT GGT GGT CTT TGT GGC GAT TGT ACC ACG 
LLPDGRVFNGGGGLCGDCTT 

1570 1580 1590 1600 1610 1620 

AAT CAT TTC GAC GCG CAA ATC TTT ACG CCA AAC TAT CTT TAC AAT AGC AAC GGC AAT CTC 
NHFDAQI FTPNYLYNSNGNL 

1630 1640 1650 1660 1670 1680 

GCG ACA CGT CCC AAG ATT ACC AGA ACC TCT ACA CAG AGC GTC AAG GTC GGT GGC AGA ATT 
ATRPKITRTSTQSVKVGGRI 

1690 1700 1710 1720 1730 1740 

ACA ATC TCG ACG GAT TCT TCG ATT AGC AAG GCG TCG TTG ATT CGC TAT GGT ACA GCG ACA 
TISTDSSISKASLIRYGTAT 



1750 1760 1770 1780 1790 1800 

CAC ACG GTT AAT ACT GAC CAG CGC CGC ATT CCC CTG ACT CTG ACA AAC AAT GGA GGA AAT 
HTVNTDQRRI PLTLTNNGGN 



1810 1820 1830 

AGC TAT TCT TTC CAA GTT CCT AGC GAC TCT 
SYSFQVPSDS 

1870 1880 1890 

TTC GTG ATG AAC TCG GCC GGT GTT CCT AGT 
FVMNSAGVPS 



1840 1850 1860 

GGT GTT GCT TTG CCT GGC TAC TGG ATG TTG 
GVALPGYWML 

1900 1910 1920 

GTG GCT TCG ACG ATT CGC GTT ACT CAG 
VASTIRVTQ 



FIG. 20C 



Date 

Mutant ID 
Mutation 
Sequence Size 



2000-04 . 10 
11.03. 10C3 
A3 (A9C) , P136(T408C) 
1917 



FIG. 21A 



G195E(G584A) , V494A (T1481C) 



10 20 30 

GCC TCA GCC CCT ATC GGA AGC GCC ATT TCT 
ASAPIGSAIS 

70 80 90 

GCA CAG TCG GGA AAT GAA TGC AAC AAG GCC 
AQSGNECNKA 

130 140 150 

ACA TTC TAT GGC GCC AAC GGG GAT CCA AAG 
T FYGANGDPK 

190 200 210 

ACA ACT CAG AAC GTC AAC GGC TTG TCT ATG 
TTQNVNGLSM 

250 260 270 

TGG ATC GGT CGC CAT GAG GTT TAT CTA AGC 
WIGRHEVYLS 

310 320 330 

GCG TCA GGT AGT TGG TTC GCC GAC TCT ACT 
ASGSWFADST 

370 380 390 

GCT CGC TAT GTT CGT CTT GTC GCT ATC ACT 
ARYVRLVAIT 

430 440 450 

GCA GAG ATC AAC GTC TTC CAA GCT AGT TCT 
AEINVFQASS 

490 500 510 

TGG GGT CCG ACT ATT GAC TTA CCG ATT GTT 
WGPTIDLPIV 

550 560 570 

GGA CGA GTC CTT ATG TGG TCT TCA TAT CGC 
GRVLMWSSYR 

610 620 630 

ATC ACT TTG ACG TCT TCC TGG GAT CCA TCC 
ITLTSSWDPS 



40 50 60 

CGC AAC AAC TGG GCC GTC ACT TGC GAC AGT 
RNNWAVTCDS 

100 110 120 

ATT GAT GGC AAC AAG GAT ACC TTT TGG CAC 
IDGNKDTFWH 

160 170 180 

CCC CCT CAC ACA TAC ACG ATT GAC ATG AAG 
PPHTYTI DMK 

220 230 240 

CTG CCT CGA CAG GAT GGT AAC CAA AAC GGC 
LPRQDGNQNG 

280 290 300 

TCA GAT GGC ACA AAC TGG GGC AGC CCT GTT 
SDGTNWGSPV 

340 350 360 

ACA AAA TAC TCC AAC TTT GAA ACT CGC CCT 
TKYSNFETRP 

400 410 420 

GAA GCG AAT GGC CAG CCC TGG ACT AGC ATT 
EANGQPVJTS I 

460 470 480 

TAC ACA GCC CCC CAG CCT GGT CTT GGA CGC 
YTAPQPGLGR 

520 530 540 

CCT GCG GCT GCA GCA ATT GAA CCG ACA TCG 
PAAAAIEPTS 

580 590 600 

AAT GAT GCA TTT GAA GGA TCC CCT GGT GGT 
NDAFEGSPGG 

640 650 660 

ACT GGT ATT GTT TCC GAC CGC ACT GTG ACA 
TGIVSDRTVT 



670 680 690 

GTC ACC AAG CAT GAT ATG TTC TGC CCT GGT 
VTKHDMFCPG 

730 740 750 

GTC ACA GGT GGC AAC GAT GCC AAG AAG ACC 
VTGGNDAKKT 



700 710 720 

ATC TCC ATG GAT GGT AAC GGT CAG ATC GTA 
ISMDGNGQIV 

760 770 780 

AGT TTG TAT GAT TCA TCT AGC GAT AGC TGG 
SLYDSSSDSW 



790 800 810 820 830 840 

ATC CCG GGA CCT GAC ATG CAA GTG GCT CGT GGG TAT CAG TCA TCA GCT ACC ATG TCA GAC 
I PGPDMQVARGYQSSATMSD 



FIG. 21B 



850 860 870 880 890 900 

GGT CGT GTT TTT ACC ATT GGA GGC TCC TGG AGC GGT GGC GTA TTT GAG AAG AAT GGC GAA 
G R V FT I GGSWSGGV FEKNGE 

910 920 930 940 950 960 

GTC TAT AGC CCA TCT TCA AAG ACA TGG ACG TCC CTA CCC AAT GCC AAG GTC AAC CCA ATG 
VYSPSSKTWTSLPNAKVNPM 

970 980 990 1000 1010 1020 

TTG ACG GCT GAC AAG CAA GGA TTG TAC CGT TCA GAC AAC CAC GCG TGG CTC TTT GGA TGG 
LTADKQGLYRSDNHAWLFGW 

1030 1040 1050 1060 1070 1080 

AAG AAG GGT TCG GTG TTC CAA GCG GGA CCT AGC ACA GCC ATG AAC TGG TAC TAT ACC AGT 
KKGSVFQAGPSTAMNWYYTS 

1090 1100 1110 1120 1130 1140 

GGA AGT GGT GAT GTG AAG TCA GCC GGA AAA CGC CAG TCT AAC CGT GGT GTA GCC CCT GAT 
GSGDVKSAGKRQSNRGVAPD 

1150 1160 1170 ^ 1180 1190 1200 

GCC ATG TGC GGA AAC GCT GTC ATG TAC GAC GCC GTT AAA GGA AAG ATC CTG ACC TTT GGC 
AMCGNAVMYDAVKGKILTFG 

1210 1220 1230 1240 1250 1260 

GGC TCC CCA GAT TAT CAA GAC TCT GAC GCC ACA ACC AAC GCC CAC ATC ATC ACC CTC GGT 
GS PDYQDS DATTNAHIITLG 

1270 1280 1290 1300 1310 1320 

GAA CCC GGA ACA TCT CCC AAC ACT GTC TTT GCT AGC AAT GGG TTG TAC TTT GCC CGA ACG 
EPGTS PNTVFASNGLYFART 

1330 1340 1350 1360 1370 1380 

TTT CAC ACC TCT GTT GTT CTT CCA GAC GGA AGC ACG TTT ATT ACA GGA GGC CAA CGA CGT 
FHTSVVLPDGSTFITGGQRR 

1390 1400 1410 1420 1430 1440 

GGA ATT CCG TTC GAG GAT TCA ACC CCG GTA TTT ACA CCT GAG ATC TAC GTC CCT GAA CAA 
GIPFEDSTPVFTPEIYVPEQ 

1450 1460 1470 1480 1490 1500 

GAC ACT TTC TAC AAG CAG AAC CCC AAC TCC ATT GTT CGC GCC TAC CAT AGC ATT TCC CTT 
DTFYKQNPNSIVRAYHSISL 

1510 1520 1530 1540 1550 1560 

TTG TTA CCT GAT GGC AGG GTA TTT AAC GGT GGT GGT GGT CTT TGT GGC GAT TGT ACC ACG 
LLPDGRVFNGGGGLCGDCTT 

1570 1580 1590 1600 1610 1620 

AAT CAT TTC GAC GCG CAA ATC TTT ACG CCA AAC TAT CTT TAC AAT AGC AAC GGC AAT CTC 
NHFDAQI FTPNYLYNSNGNL 

1630 1640 1650 1660 1670 1680 

GCG ACA CGT CCC AAG ATT ACC AGA ACC TCT ACA CAG AGC GTC AAG GTC GGT GGC AGA ATT 
ATRPKI TRTSTQSVKVGGRI 

1690 1700 1710 1720 1730 1740 

ACA ATC TCG ACG GAT TCT TCG ATT AGC AAG GCG TCG TTG ATT CGC TAT GGT ACA GCG ACA 
TISTDSS ISKASLIRYGTAT 

1750 1760 1770 1780 1790 1800 

CAC ACG GTT AAT ACT GAC CAG CGC CGC ATT CCC CTG ACT CTG ACA AAC AAT GGA GGA AAT 
HTVNTDQRRIPLTLTNNGGN 



1810 3820 1830 

AGC TAT TCT TTC CAA GTT CCT AGC GAC TCT 
SYSFQVPSDS 



1840 1850 1860 

GGT GTT GCT TTG CCT GGC TAC TGG ATG TTG 
GVALPGYWML 



1870 1880 1890 1900 1910 1920 

TTC GTG ATG AAC TCG GCC GGT GTT CCT AGT GTG GCT TCG ACG ATT CGC GTT ACT CAG 
FVMNSAGVPSVASTIRVTQ 



FIG. 21C 



1 



FIG. 22A 



Date 

Mutant ID 
Mutation 
Sequence Size 



2000.04.10 
11.03. 10D6 

Pl36(T408C) f T218(T6S4C), L312(A936G) / V4 94 A (Tl 4 81C) , N535D ( Al 603G) 
1917 



10 20 30 

GCC TCA GCA CCT ATC GGA AGC GCC ATT TCT 
ASAPIGSAIS 

70 80 90 

GCA CAG TCG GGA AAT GAA TGC AAC AAG GCC 
AQSGNECNKA 

130 140 150 

ACA TTC TAT GGC GCC AAC GGG GAT CCA AAG 
TFYGANGDPK 

190 200 210 

ACA ACT CAG AAC GTC AAC GGC TTG TCT ATG 
TTQNVNGLSM 

250 260 270 

TGG ATC GGT CGC CAT GAG GTT TAT CTA AGC 
WIGRHEVYLS 



40 50 60 

CGC AAC AAC TGG GCC GTC ACT TGC GAC AGT 
RNNWAVTCDS 

100 110 120 

ATT GAT GGC AAC AAG GAT ACC TTT TGG CAC 
IDGNKDTFWH 

160 170 180 

CCC CCT CAC ACA TAC ACG ATT GAC ATG AAG 
PPHTYTIDMK 

220 230 240 

CTG CCT CGA CAG GAT GGT AAC CAA AAC GGC 
LPRQDGNCNG 

280 290 300 

TCA GAT GGC ACA AAC TGG GGC AGC CCT GTT 
SDGTNWGSPV 



310 320 330 

GCG TCA GGT AGT TGG TTC GCC GAC TCT ACT 
ASGSWFADST 

370 380 390 

GCT CGC TAT GTT CGT CTT GTC GCT ATC ACT 
ARYVRLVAIT 

430 440 450 

GCA GAG ATC AAC GTC TTC CAA GCT AGT TCT 
AEINVFQASS 

490 500 510 

TGG GGT CCG ACT ATT GAC TTA CCG ATT GTT 
WGPTIDLPIV 



340 350 360 

ACA AAA TAC TCC AAC TTT GAA ACT CGC CCT 
TKYSNFETRP 

400 410 420 

GAA GCG AAT GGC CAG CCC TGG ACT AGC ATT 
EANGQPWTS I 

460 470 480 

TAC ACA GCC CCC CAG CCT GGT CTT GGA CGC 
YTAPQPGLGR 

520 530 540 

CCT GCG GCT GCA GCA ATT GAA CCG ACA TCG 
PAAAAIEPTS 



550 560 570 580 590 600 

GGA CGA GTC CTT ATG TGG TCT TCA TAT CGC AAT GAT GCA TTT GGA GGA TCC CCT GGT GGT 
GRVLMWSSYRNDAFGGSPGG 

610 620 630 640 650 660 

ATC ACT TTG ACG TCT TCC TGG GAT CCA TCC ACT GGT ATT GTT TCC GAC CGC ACC GTG ACA 
ITLTSSWDPSTGIVSDRTVT 

670 680 690 700 710 720 

GTC ACC AAG CAT GAT ATG TTC TGC CCT GGT ATC TCC ATG GAT GGT AAC GGT CAG ATC GTA 
VTKHDMFCPGI SMDGNGQIV 

730 740 750 760 770 780 

GTC ACA GGT GGC AAC GAT GCC AAG AAG ACC AGT TTG TAT GAT TCA TCT AGC GAT AGC TGG 
VTGGNDAKKTSLYDSSSDSW 



790 800 810 820 830 840 

ATC CCG GGA CCT GAC ATG CAA GTG GCT CGT GGG TAT CAG TCA TCA GCT ACC ATG TCA GAC 
I PGPDMQVARGYQSSATMSD 



FIG. 22B 



850 860 870 880 890 900 

GGT CGT GTT TTT ACC ATT GGA GGC TCC TGG AGC GGT GGC GTA TTT GAG AAG AAT GGC GAA 
GRVFTIGGSWSGGVFEKNGE 

910 920 930 940 950 960 

GTC TAT AGC CCA TCT TCA AAG ACA TGG ACG TCC CTG CCC AAT GCC AAG GTC AAC CCA ATG 
VYSPSSKTWTSLPNAKVNPM 

970 980 990 1000 1010 1020 

TTG ACG GCT GAC AAG CAA GGA TTG TAC CGT TCA GAC AAC CAC GCG TGG CTC TTT GGA TGG 
LTADKQGLYRS DN HAWLFGW 

1030 1040 1050 1060 1070 1080 

AAG AAG GGT TCG GTG TTC CAA GCG GGA CCT AGC ACA GCC ATG AAC TGG TAC TAT ACC AGT 
KKGSVFQAGPSTAMNWYYTS 

1090 1100 1110 1120 1130 1140 

GGA AGT GGT GAT GTG AAG TCA GCC GGA AAA CGC CAG TCT AAC CGT GGT GTA GCC CCT GAT 
GSGDVKSAGKRQSNRGVAPD 

1150 1160 1170 1180 1190 1200 

GCC ATG TGC GGA AAC GCT GTC ATG TAC GAC GCC GTT AAA GGA AAG ATC CTG ACC TTT GGC 
AMCGNAVMYDAVKGKI LTFG 

1210 1220 1230 1240 1250 1260 

GGC TCC CCA GAT TAT CAA GAC TCT GAC GCC ACA ACC AAC GCC CAC ATC ATC ACC CTC GGT 
GSPDYQDSDATTNAHI ITLG 

1270 1280 1290 1300 1310 1320 

GAA CCC GGA ACA TCT CCC AAC ACT GTC TTT GCT AGC AAT GGG TTG TAC TTT GCC CGA ACG 
EPGTS PNTVFASNGLYFART 

1330 1340 1350 1360 1370 1380 

TTT CAC ACC TCT GTT GTT CTT CCA GAC GGA AGC ACG TTT ATT ACA GGA GGC CAA CGA CGT 
FHTSVVLPDGSTFITGGQRR 

1390 1400 1410 1420 1430 1440 

GGA ATT CCG TTC GAG GAT TCA ACC CCG GTA TTT ACA CCT GAG ATC TAC GTC CCT GAA CAA 
GIPFEDSTPVFTPEIYVPEQ 

1450 1460 1470 1480 1490 1500 

GAC ACT TTC TAC AAG CAG AAC CCC AAC TCC ATT GTT CGC GCC TAC CAT AGC ATT TCC CTT 
DTFYKQNPKSIVRAYHSISL 

1510 1520 1530 1540 1550 1560 

TTG TTA CCT GAT GGC AGG GTA TTT AAC GGT GGT GGT GGT CTT TGT GGC GAT TGT ACC ACG 
LLPDGRVFNGGGGLCGDCTT 

1570 1580 1590 1600 1610 1620 

AAT CAT TTC GAC GCG CAA ATC TTT ACG CCA AAC TAT CTT TAC GAT AGC AAC GGC AAT CTC 
NHFDAQIFTPNYLYDSNGNL 

1630 1640 1650 1660 1670 1680 

GCG ACA CGT CCC AAG ATT ACC AGA ACC TCT ACA CAG AGC GTC AAG GTC GGT GGC AGA ATT 
ATRPK1TRTSTQSVKVGGR1 

1690 1700 1710 1720 1730 1740 

ACA ATC TCG ACG GAT TCT TCG ATT AGC AAG GCG TCG TTG ATT CGC TAT GGT ACA GCG ACA 
T1STDSSI SKASL1RYGTAT 

1750 1760 1770 1780 1790 1800 

CAC ACG GTT AAT ACT GAC CAG CGC CGC ATT CCC CTG ACT CTG ACA AAC AAT GGA GGA AAT 
HTVNTDQRRIPLTLTNNGGN 



1310 1820 1830 

AGC TAT TCT TTC CAA GTT CCT AGC GAC TCT 
SYSFQVPSDS 

1870 L880 1890 

TTC GTG ATG AAC TCG GCC GGT GTT CCT AGT 
FVMNSAGVPS 



1840 1850 1860 

GGT GTT GCT TTG CCT GGC TAC TGG ATG TTG 
GVALPGYWML 

1900 1910 1920 

GTG GCT TCG ACG ATT CGC GTT ACT CAG 
VASTIRVTQ 



FIG. 22C 



Date 

Mutan ID 
Mutation 
Sequence Size 



2000.04.10 
11.03. 13E12 
M70V(A208G) , 
1917 



FIG, 23A 



P136(T408C), V494A(T1481C) 



10 20 30 40 50 60 

GCC TCA GCA CCT ATC GGA AGC GCC ATT TCT CGC AAC AAC TGG GCC GTC ACT TGC GAC AGT 
ASAPIGSAISRNNWAVTCDS 



70 80 90 

GCA CAG TCG GGA AAT GAA TGC AAC AAG GCC 
AQSGNECNKA 

130 140 150 

AC A TTC TAT GGC GCC AAC GGG GAT CCA AAG 
TFYGANGDPK 



100 110 120 

ATT GAT GGC AAC AAG GAT ACC TTT TGG CAC 
IDGNKDTFWH 

160 170 180 

CCC CCT CAC ACA TAC ACG ATT GAC ATG AAG 
PPHTYTI DMK 



190 200 210 

ACA ACT CAG AAC GTC AAC GGC TTG TCT GTG 
TTONVNGLoV 

250 260 270 

TGG ATC GGT CGC CAT GAG GTT TAT CTA AGC 
W1GRHEVYLS 



220 230 240 

CTG CCT CGA CAG GAT GGT AAC CAA AAC GGC 
LPRQDGNQNG 

280 290 300 

TCA GAT GGC ACA AAC TGG GGC AGC CCT GTT 
SDGTNWGSPV 



310 320 330 340 350 360 

GCG TCA GGT AGT TGG TTC GCC GAC TCT ACT ACA AAA TAC TCC AAC TTT GAA ACT CGC CCT 
ASGSWFADSTTKYSNFETRP 



370 380 390 400 410 420 

GCT CGC TAT GTT CGT CTT GTC GCT ATC ACT GAA GCG AAT GGC CAG CCC TGG ACT AGC ATT 
ARYVRLVAITEANGQPWTSI 

430 440 450 460 470 480 

GCA GAG ATC AAC GTC TTC CAA GCT AGT TCT TAC ACA GCC CCC CAG CCT GGT CTT GGA CGC 
AEINVFQASSYTAPQPGLGR 

490 500 510 520 530 540 

TGG GGT CCG ACT ATT GAC TTA CCG ATT GTT CCT GCG GCT GCA GCA ATT GAA CCG ACA TCG 
WGPTIDLPIVPAAAAIEPTS 



550 560 570 

GGA CGA GTC CTT ATG TGG TCT TCA TAT CGC 
GRVLMWSSYR 

610 620 630 

ATC ACT TTG ACG TCT TCC TGG GAT CCA TCC 
ITLTSSWDPS 

670 680 690 

GTC ACC AAG CAT GAT ATG TTC TGC CCT GGT 
VTKHDMFCPG 

730 740 750 

GTC ACA GGT GGC AAC GAT GCC AAG AAG ACC 
VTGGNDAKKT 



580 590 600 

AAT GAT GCA TTT GGA GGA TCC CCT GGT GGT 
NDAFGGSPGG 

640 650 660 

ACT GGT ATT GTT TCC GAC CGC ACT GTG ACA 
TGIVSDRTVT 

700 710 720 

ATC TCC ATG GAT GGT AAC GGT CAG ATC GTA 
ISMDGNGQIV 

760 770 780 

AGT TTG TAT GAT TCA TCT AGC GAT AGC TGG 
SLYDSSSDSW 



790 800 810 820 830 840 

ATC CCG GGA CCT GAC ATG CAA GTG GCT CGT GGG TAT CAG TCA TCA GCT ACC ATG TCA GAC 
IPG PDMQVARGYQSSATMSD 



FIG. 23B 



850 860 870 880 890 900 

GGT CGT GTT TTT ACC AT T GGA GGC TCC TGG AGC GGT GGC GTA TTT GAG AAG AAT GGC GAA 
GRVFT IGGSWSGGVFEKNGE 

910 920 930 940 950 960 

GTC TAT AGC CCA TCT TCA AAG ACA TGG ACG TCC CTA CCC AAT GCC AAG GTC AAC CCA ATG 
VYSPSSKTWTSLPNAKVNPM 

970 980 990 1000 1010 1020 

TTG ACG GCT GAC AAG CAA GGA TTG TAC CGT TCA GAC AAC CAC GCG TGG CTC TTT GGA TGG 
LTADKQGLYRSDNHAWLFGW 

1030 1040 1050 1060 1070 1080 

AAG AAG GGT TCG GTG TTC CAA GCG GGA CCT AGC ACA GCC ATG AAC TGG TAC TAT ACC AGT 
KKGSV FQAGPSTAMNWYYTS 

1090 1100 1110 1120 1130 1140 

GGA AGT GGT GAT GTG AAG TCA GCC GGA AAA CGC CAG TCT AAC CGT GGT GTA GCC CCT GAT 
GSGDVKSAGKRQSNRGVAPD 

1150 1160 1170 1180 1190 1200 

GCC ATG TGC GGA AAC GCT GTC ATG TAC GAC GCC GTT AAA GGA AAG ATC CTG ACC TTT GGC 
AMCGNAVMYDAVKGKI LTFG 

1210 1220 1230 1240 1250 1260 

GGC TCC CCA GAT TAT CAA GAC TCT GAC GCC ACA ACC AAC GCC CAC ATC ATC ACC CTC GGT 
GS PDYQDSDATTNAHI ITLG 

1270 1280 1290 1300 1310 1320 

GAA CCC GGA ACA TCT CCC AAC ACT GTC TTT GCT AGC AAT GGG TTG TAC TTT GCC CGA ACG 
E PGTS PNTVFASNGLY FART 

1330 1340 1350 1360 1370 1380 

TTT CAC ACC TCT CTT GTT CTT CCA GAC GGA AGC ACG TTT ATT ACA GGA GGC CAA CGA CGT 
FHTSVVLPDGSTFITGGQRR 

1390 1400 1410 1420 1430 1440 

GGA ATT CCG TTC GAG GAT TCA ACC CCG GTA TTT ACA CCT GAG ATC TAC GTC CCT GAA CAA 
GI PFEDSTPVFTPEIYVPEQ 

1450 1460 1470 1480 1490 1500 

GAC ACT TTC TAC AAG CAG AAC CCC AAC TCC ATT GTT CGC GCC TAC CAT AGC ATT TCC CTT 
DT FYKQNPNS IVRAYHS ISL 

1510 1520 1530 1540 1550 1560 

TTG TTA CCT GAT GGC AGG GTA TTT AAC GGT GGT GGT GGT CTT TGT GGC GAT TGT ACC ACG 
LLPDGRVFNGGGGLCGDCTT 

1570 1580 1590 1600 1610 1620 

AAT CAT TTC GAC GCG CAA ATC TTT ACG CCA AAC TAT CTT TAC AAT AGC AAC GGC AAT CTC 
NHFDAQI FTPNYLYNSNGNL 

1630 1640 1650 1660 1670 1680 

GCG ACA CGT CCC AAG ATT ACC AGA ACC TCT ACA CAG AGC GTC AAG GTC GGT GGC AGA ATT 
ATRPKITRTSTQSVKVGGRI 

1690 1700 1710 1720 1730 1740 

ACA ATC TCG ACG GAT TCT TCG ATT AGC AAG GCG TCG TTG ATT CGC TAT GGT ACA GCG ACA 
TISTDSSISKASLIRYGTAT 

1750 1760 1770 1780 1790 1800 

CAC ACG GTT AAT ACT GAC CAG CGC CGC ATT CCC CTG ACT CTG ACA AAC AAT GGA GGA AAT 
HTVNTDQRRI PLTLTNNGGN 



1810 1820 1830 

AGC TAT TCT TTC CAA GTT CCT AGC GAC TCT 
SYSFQVPSDS 

1870 1880 1890 

TTC GTG ATG AAC TCG GCC GGT GTT CCT AGT 
FVMNSAGVPS 



1840 1850 1860 

GGT GTT GCT TTG CCT GGC TAC TGG ATG TTG 
GVALPGYWML 

1900 1910 1920 

GTG GCT TCG ACG ATT CGC GTT ACT CAG 
VAST.IRVTQ 



FIG. 23C 



FIG. 24A 



Date 

Filename 
Mutation 

Sequence Si2e 



2000.04.10 
1.06.20E7 

S10P(T2 8C) ,M7 0V(A208G) , PI 36 (T4 08C) , G195E (G584A) , V4 94A (T14 81C) 

NS35D(A1603G} 
1917 



10 20 30 40 50 60 1 

GCC TCA GCA CCT ATC GGA AGC GCC ATT CCT CGC AAC AAC TGG GCC GTC ACT TGC GAC AGT 
ASAPIGSAI PRNNWAVTCDS 

70 80 90 100 110 120 

GCA CAG TCG GGA AAT GAA TGC AAC AAG GCC ATT GAT GGC AAC AAG GAT ACC TTT TGG CAC 
AQSGNECNKAI DGNKDT F W H 

130 140 150 160 170 180 

ACA TTC TAT GGC GCC AAC GGG GAT CCA AAG CCC CCT CAC AC A TAC ACG ATT GAC ATG AAG 
TFYGANGDPKPPHTYTIDMK 

190 200 210 220 230 240 

ACA ACT CAG AAC GTC AAC GGC TTG TCT GTG CTG CCT CGA CAG GAT GGT AAC CAA AAC GGC 
TTQNVNGLSVLPRQDGNQNG 

250 260 270 280 290 300 

TGG ATC GGT CGC CAT GAG GTT TAT CTA AGC TCA GAT GGC ACA AAC TGG GGC AGC CCT GTT 
W I GRHEVYLS SDGTNWGSPV 

310 320 330 340 350 360 

GCG TCA GGT AGT TGG TTC GCC GAC TCT ACT ACA AAA TAC TCC AAC TTT GAA ACT CGC CCT 
ASGSWFADSTTKYSNFETRP 

370 380 390 400 410 420 

GCT CGC TAT GTT CGT CTT GTC GCT ATC ACT GAA GCG AAT GGC CAG CCC TGG ACT AGC ATT 
ARYVRLVAITEANGQPWTS I 

430 440 450 460 470 480 

GCA GAG ATC AAC GTC TTC CAA GCT AGT TCT TAC ACA GCC CCC CAG CCT GGT CTT GGA CGC 
AEINVFQASSYTAPQPGLGR 

490 500 510 520 530 540 

TGG GGT CCG ACT ATT GAC TTA CCG ATT GTT CCT GCG GCT GCA GCA ATT GAA CCG ACA TCG 
WGPTIDLPIVPAAAAIEPTS 

550 560 570 580 590 600 

GGA CGA GTC CTT ATG TGG TCT TCA TAT CGC AAT GAT GCA TTT GAA GGA TCC CCT GGT GGT 
GRVLMWSSYRNDAFEGS PGG 

610 620 630 640 650 660 

ATC ACT TTG ACG TCT TCC TGG GAT CCA TCC ACT GGT ATT GTT TCC GAC CGC ACT GTG ACA 
ITLTSSWDPSTGIVSDRTVT 

670 680 690 700 710 720 

GTC ACC AAG CAT GAT ATG TTC TGC CCT GGT ATC TCC. ATG GAT GGT AAC GGT CAG ATC GTA 
VTKHDMFCPGISMDGNGQIV 

730 740 750 760 770 780 

GTC ACA GGT GGC AAC GAT GCC AAG AAG ACC AGT TTG TAT GAT TCA TCT AGC GAT AGC TGG 
VTGGNDAKKTSLY DSSS DSW 

790 800 810 820 830 840 

ATC CCG GGA CCT GAC ATG CAA GTG GCT CGT GGG TAT CAG TCA TCA GCT ACC ATG TCA GAC 
IPGPDMQVARGYQSSATMSD 



FIG. 24B 



850 860 870 880 890 900 

GGT CGT GTT TTT ACC ATT GGA GGC TCC TGG AGC GGT GGC GTA TTT GAG AAG AAT GGC GAA 
GRVFTIGGSWSGGVFEKNGE 

910 920 930 940 950 960 

GTC TAT AGC CCA TCT TCA AAG ACA TGG ACG TCC CTA CCC AAT GCC AAG GTC AAC CCA ATG 
VYSPSSKTWTSLPNAKVNPM 

970 980 990 1000 1010 1020 

TTG ACG GCT GAC AAG CAA GGA TTG TAC CGT TCA GAC AAC CAC GCG TGG CTC TTT GGA TGG 
LTADKQGLYRSDNHAWLFGW 

1030 1040 1050 1060 1070 1080 

AAG AAG GGT TCG GTG TTC CAA GCG GGA CCT AGC ACA GCC ATG AAC TGG TAC TAT ACC AGT 
KKGSVFQAG PSTAMNWYYTS 

1090 1100 1110 1120 1130 1140 

GGA AGT GGT GAT GTG AAG TCA GCC GGA AAA CGC CAG TCT AAC CGT GGT GTA GCC CCT GAT 
GSGDVKSAGKRQSN RGVAPD 

1150 1160 1170 1180 1190 1200 

GCC ATG TGC GGA AAC GCT GTC ATG TAC GAC GCC GTT AAA GGA AAG ATC CTG ACC TTT GGC 
AMCGNAVMYDAVKGKILTFG 

1210 1220 1230 1240 1250 1260 

GGC TCC CCA GAT TAT CAA GAC TCT GAC GCC ACA ACC AAC GCC CAC ATC ATC ACC CTC GGT 
GSPDYQDS DATTNAHI ITLG 

1270 1280 1290 1300 1310 1320 

GAA CCC GGA ACA TCT CCC AAC ACT GTC TTT GCT AGC AAT GGG TTG TAC TTT GCC CGA ACG 
EPGTSPNTVFASNGLYFART 

1330 1340 1350 1360 1370 1380 

TTT CAC ACC TCT GTT GTT CTT CCA GAC GGA AGC ACG TTT ATT ACA GGA GGC CAA CGA CGT 
FHTSVVLPDGSTFI TGGQRR 

1390 1400 1410 1420 1430 1440 

GGA ATT CCG TTC GAG GAT TCA ACC CCG GTA TTT ACA CCT GAG ATC TAC GTC CCT GAA CAA 
GIPFEDST PVFTPEIYVPEQ 

1450 1460 1470 1480 1490 1500 

GAC ACT TTC TAC AAG CAG AAC CCC AAC TCC ATT GTT CGC GCC TAC CAT AGC ATT TCC CTT 
DTFYKQNPNSIVRAYHSISL 

1510 1520 1530 1540 1550 1560 

TTG TTA CCT GAT GGC AGG GTA TTT AAC GGT GGT GGT GGT CTT TGT GGC GAT TGT ACC ACG 
LLPDGRV FNGGGGLCGDCTT 

1570 1580 1590 1600 1610 1620 

AAT CAT TTC GAC GCG CAA ATC TTT ACG CCA AAC TAT CTT TAC GAT AGC AAC GGC AAT CTC 
NHFDAQI FT PNYLYDSNGNL 

1630 1640 1650 1660 1670 1680 

GCG ACA CGT CCC AAG ATT ACC AGA ACC TCT ACA CAG AGC GTC AAG GTC GGT GGC AGA ATT 
ATRPKI T RTSTQSVKVGGR1 

1690 1700 1710 1720 1730 1740 

ACA ATC TCG ACG GAT TCT TCG ATT AGC AAG GCG TCG TTG ATT CGC TAT GGT ACA GCG ACA 
TISTDSS I SKASLI RYGTAT 



1750 1760 1770 1780 1790 1800 

CAC ACG GTT AAT ACT GAC CAG CGC CGC ATT CCC CTG ACT CTG ACA AAC AAT GGA GGA AAT 
HTVNTDQRRIPLTLTNNGGN 



18 10 1820 1830 1840 1850 1860 

AGC TAT TCT TTC CAA GTT CCT AGC GAC TCT GGT GTT GCT TTG CCT GGC TAC TGG ATG TTG 
SYSFQVPSDSGVALPGYWML 



1870 1880 1890 1900 1910 1920 

TTC GTG ATG AAC TCG GCC GGT GTT CCT AGT GTG GCT TCG ACG ATT CGC GTT ACT CAG 
FVMNSAGVPSVASTIRVTQ 



FIG. 24C 
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Date 

Mutant ID 
Mutation 
Sequence Size 



2000.04 . 11 
1 . D4 
N413D(A1237G) 
1917 



FIG. 25A 



10 20 30 

GCC TCA GCA CCT ATC GGA AGC GCC ATT TCT 
ASAPIGSAIS 

70 80 90 

GCA CAG TCG GGA AAT GAA TGC AAC AAG GCC 
AQSGNECNKA 

130 140 150 

ACA TTC TAT GGC GCC AAC GGG GAT CCA AAG 
TFYGANGDPK 



40 50 60 

CGC AAC AAC TGG GCC GTC ACT TGC GAC AGT 
RNNWAVTCDS 

100 110 120 

ATT GAT GGC AAC AAG GAT ACC TTT TGG CAC 
IDGNKDTFWH 

160 170 180 

CCC CCT CAC ACA TAC ACG ATT GAC ATG AAG 
PPHTYTIDMK 



190 200 210 

ACA ACT CAG AAC GTC AAC GGC TTG TCT ATG 
TTQNVNGLSM 

250 260 270 

TGG ATC GGT CGC CAT GAG GTT TAT CTA AGC 
VJIGRHEVYLS 



220 230 240 

CTG CCT CGA CAG GAT GGT AAC CAA AAC GGC 
LPR.QDGNQNG 

280 290 300 

TCA GAT GGC ACA AAC TGG GGC AGC CCT GTT 
SDGTNWGSPV 



310 320 330 340 350 360 

GCG TCA GGT AGT TGG TTC GCC GAC TCT ACT ACA AAA TAC TCC AAC TTT GAA ACT CGC CCT 
ASGSW FADSTTKYSNFETRP 



370 380 390 

GCT CGC TAT GTT CGT CTT GTC GCT ATC ACT 
ARYVRLVAIT 

430 440 450 

GCA GAG ATC AAC GTC TTC CAA GCT AGT TCT 
AEINVFQASS 

490 500 510 

TGG GGT CCG ACT ATT GAC TTA CCG ATT GTT 
WGPTIDLPIV 



400 410 420 

GAA GCG AAT GGC CAG CCT TGG ACT AGC ATT 
EANGQPWTSI 

460 470 480 

TAC ACA GCC CCC CAG CCT GGT CTT GGA CGC 
YTAPQPGLGR 

520 530 540 

CCT GCG GCT GCA GCA ATT GAA CCG ACA TCG 
PAAAAIE PTS 



550 560 570 580 590 600 

GGA CGA GTC CTT ATG TGG TCT TCA TAT CGC AAT GAT GCA TTT GGA GGA TCC CCT GGT GGT 
GRVLMV7 S SYRN DAFGGS PGG 



610 620 630 640 650 660 

ATC ACT TTG ACG TCT TCC TGG GAT CCA TCC ACT GGT ATT GTT TCC GAC CGC ACT GTG ACA 
ITLTSSWDPSTGIVSDRTVT 

670 680 690 700 710 720 

GTC ACC AAG CAT GAT ATG TTC TGC CCT GGT ATC TCC ATG GAT GGT AAC GGT CAG ATC GTA 
VTKHDMFCPGISMDGNGQ1V 

730 740 750 760 770 780 

GTC ACA GGT GGC AAC GAT GCC AAG AAG ACC AGT TTG TAT GAT TCA TCT AGC GAT AGC TGG 
VTGGNDAKKTSLYDSSSDSW 



790 800 810 820 830 840 

ATC CCG GGA CCT GAC ATG CAA GTG GCT CGT GGG TAT CAG TCA TCA GCT ACC ATG TCA GAC 
I PGPDMQVARGYQSSATMSD 



FIG. 25B 



850 860 870 880 890 900 

GGT CGT GTT TTT ACC ATT GGA GGC TCC TGG AGC GGT GGC GTA TTT GAG AAG AAT GGC GAA 
GRVFTIGGSWSGGVFEKNGE 

9X0 920 930 940 950 960 

GTC TAT AGC CCA TCT TCA AAG ACA TGG ACG TCC CTA CCC AAT GCC AAG GTC AAC CCA ATG 
VYSPSSKTWTSLPNAKVNPM 

970 980 990 1000 1010 1020 

TTG ACG GCT GAC AAG CAA GGA TTG TAC CGT TCA GAC AAC CAC GCG TGG CTC TTT GGA TGG 
LTADKQGLYRSDNHAWLFGW 

1030 1040 1050 1060 1070 1080 

AAG AAG GGT TCG GTG TTC CAA GCG GGA CCT AGC ACA GCC ATG AAC TGG TAC TAT ACC AGT 
KKGSVFQAGPSTAMNWYYTS 

1090 1100 1110 1120 1130 U40 

GGA AGT GGT GAT GTG AAG TCA GCC GGA AAA CGC CAG TCT AAC CGT GGT GTA GCC CCT GAT 
GSGDVKSAGKRQSNRGVAPD 

U50 1160 1170 1180 1190 1200 

GCC ATG TGC GGA AAC GCT GTC ATG TAC GAC GCC GTT AAA GGA AAG ATC CTG ACC TTT GGC 

AVMYDAVKGKILTFG 



M 



N 



1210 1220 1230 1240 1250 1260 

GGC TCC CCA GAT TAT CAA GAC TCT GAC GCC ACA ACC GAC GCC CAC ATC ATC ACC CTC GGT 
GSPDYQDSDATTDAHI ITLG 

1270 1280 1290 1300 1310 1320 

GAA CCC GGA ACA TCT CCC AAC ACT GTC TTT GCT AGC AAT GGG TTG TAC TTT GCC CGA ACG 
£ PGTS PNTV F A S N G L Y FART 

1330 1340 1350 1360 1370 1380 

TTT CAC ACC TCT GTT GTT CTT CCA GAC GGA AGC ACG TTT ATT ACA GGA GGC CAA CGA CGT 
FH TSVVLPDGSTFITGGQRR 

1390 1400 1410 1420 1430 1440 

GGA ATT CCG TTC GAG GAT TCA ACC CCG GTA TTT ACA CCT GAG ATC TAC GTC CCT GAA CAA 
GIPFEDSTPVFTPEIYVPEQ 

1450 1460 1470 1480 1490 1500 

GAC ACT TTC TAC AAG CAG AAC CCC AAC TCC ATT GTT CGC GTC TAC CAT AGC ATT TCC CTT 
DTFYKQNPNSIVRVYHSISL 

1510 1520 1530 1540 1550 1560 

TTG TTA CCT GAT GGC AGG GTA TTT AAC GGT GGT GGT GGT CTT TGT GGC GAT TGT ACC ACG 
LLPDGRV FNGGGGLCGDCTT 

1570 1580 1590 1600 1610 1620 

AAT CAT TTC GAC GCG CAA ATC TTT ACG CCA AAC TAT CTT TAC AAT AGC AAC GGC AAT CTC 
NHFDAQI FTPNYLYNSNGNL 

1630 1640 1650 1660 1670 1680 

GCG ACA CGT CCC AAG ATT ACC AGA ACC TCT ACA CAG AGC GTC AAG GTC GGT GGC AGA ATT 
ATRPKITRTSTQSVKVGGRI 

1690 1700 1710 1720 1730 1740 

ACA ATC TCG ACG GAT TCT TCG ATT AGC AAG GCG TCG TTG ATT CGC TAT GGT ACA GCG ACA 
TISTDSSISKASLIRYGTAT 

1750 1760 1770 1780 1790 1800 

CAC ACG GTT AAT ACT GAC CAG CGC CGC ATT CCC CTG ACT CTG ACA AAC AAT GGA GGA AAT 
TDQRRIPLTLTNNGGN 



H 



N 



1810 1820 1830 

AGC TAT TCT TTC CAA GTT CCT AGC GAC TCT 
SYSFQ^PSDS 

1870 1880 1890 

TTC GTG ATG AAC TCG GCC GGT GTT CCT AGT 
FVMNSAGVPS 



1840 1850 1860 

GGT GTT GCT TTG CCT GGC TAC TGG ATG TTG 
GVALPGYWML 

1900 1910 1920 

GTG GCT TCG ACG ATT CGC GTT ACT CAG 
VASTIRVTQ 



FIG. 25C 



Date 

Mutant ID 
Mu ta t ion 
Sequence Size 



2000.04 11 
2.G4 

N413D (A1237G) , 
1917 



FIG. 26A 



S550 (T1650A) 



10 20 30 

GCC TCA GCA CCT ATC GGA AGC GCC ATT TCT 
ASAPIGSAIS 



40 50 60 

CGC AAC AAC TGG GCC GTC ACT TGC GAC AGT 
RNNWAVTCDS 



70 80 90 

GCA CAG TCG GGA AAT GAA TGC AAC AAG GCC 
AQSGNECNKA 

130 140 150 

ACA TTC TAT GGC GCC AAC GGG GAT CCA AAG 
TFYGANGDPK 



100 110 120 

ATT GAT GGC AAC AAG GAT ACC TTT TGG CAC 
IDGNKDTFWH 

160 170 180 

CCC CCT CAC ACA TAC ACG ATT GAC ATG AAG 
PPHTYTI DMK 



190 200 210 

ACA ACT CAG AAC GTC AAC GGC TTG TCT ATG 
TTQNVNGLSM 

250 260 270 

TGG ATC GGT CGC CAT GAG GTT TAT CTA AGC 
VJIGRHEVYLS 



220 230 240 

CTG CCT CGA CAG GAT GGT AAC CAA AAC GGC 
LPRQDGNQNG 

280 290 300 

TCA GAT GGC ACA AAC TGG GGC AGC CCT GTT 
SDGTNWGSPV 



310 320 330 

GCG TCA GGT AGT TGG TTC GCC GAC TCT ACT 
ASGSW FADST 

370 380 390 

GCT CGC TAT GTT CGT CTT GTC GCT ATC ACT 
ARYVRLVAIT 

430 ■ 440 450 

GCA GAG ATC AAC GTC TTC CAA GCT AGT TCT 
AEINVFQASS 

490 500 510 

TGG GGT CCG ACT ATT GAC TTA CCG ATT GTT 
WGPTIDLPIV 



340 350 360 

ACA AAA TAC TCC AAC TTT GAA ACT CGC CCT 
TKYSNFETRP 

400 410 420 

GAA GCG AAT GGC CAG CCT TGG ACT AGC ATT 
EANGQPWTS I 

460 470 480 

TAC ACA GCC CCC CAG CCT GGT CTT GGA CGC 
YTAPQPGLGR 

520 530 540 

CCT GCG GCT GCA GCA ATT GAA CCG ACA TCG 
PAAAAIEPTS 



550 560 570 580 590 600 

GGA CGA GTC CTT ATG TGG TCT TCA TAT CGC AAT GAT GCA TTT GGA GGA TCC CCT GGT GGT 
GRVLMWSSYRNDAFGGSPGG 

610 620 630 640 650 660 

ATC ACT TTG ACG TCT TCC TGG GAT CCA TCC ACT GGT ATT GTT TCC GAC CGC ACT GTG ACA 
ITLTSSWDPSTGIVSDRTVT 

670 680 690 700 710 720 

GTC ACC AAG CAT GAT ATG TTC TGC CCT GGT ATC TCC ATG GAT GGT AAC GGT CAG ATC GTA 
VTKHDMFCPGI SMDGNGQIV 

730 740 750 760 770 780 

GTC ACA GGT GGC AAC GAT GCC AAG AAG ACC AGT TTG TAT GAT TCA TCT AGC GAT AGC TGG 
VTGGN DAKKTSLYDSSS DSW 



790 800 810 820 830 840 

ATC CCG GGA CCT GAC ATG CAA GTG GCT CGT GGG TAT CAG TCA TCA GCT ACC ATG TCA GAC 
I PGPDMQVARGYQSSATMSD 



FIG. 26B 



850 860 870 880 890 900 

GGT CGT GTT TTT ACC ATT GGA GGC TCC TGG AGC GGT GGC GTA TTT GAG AAG AAT GGC GAA 
GRV FT IGGSWSGGVFEKNGE 

910 920 930 940 950 960 

GTC TAT AGC CCA TCT TCA AAG ACA TGG ACG TCC CTA CCC AAT GCC AAG GTC AAC CCA ATG 
VYS PSSKTWTSLPNAKVNPM 

970 980 990 1000 1010 1020 

TTG ACG GCT GAC AAG CAA GGA TTG TAC CGT TCA GAC AAC CAC GCG TGG CTC TTT GGA TGG 
LTADKQG LYRS DNHAWLFGW 

1030 1040 1050 1060 1070 1080 

AAG AAG GGT TCG GTG TTC CAA GCG GGA CCT AGC ACA GCC ATG AAC TGG TAC TAT ACC AGT 
KKGSVFQAGPSTAMNWYYTS 

1090 1100 1110 1120 1130 1140 

GGA AGT GGT GAT GTG AAG TCA GCC GGA AAA CGC CAG TCT AAC CGT GGT GTA GCC CCT GAT 
GSGDVKSAGKRQSNRGVAPD 

1150 1160 1170 1180 1190 1200 

GCC ATG TGC GGA AAC GCT GTC ATG TAC GAC GCC GTT AAA GGA AAG A.TC CTG ACC TTT GGC 
AMCGNAVMYDAVKCKILTFG 

1210 1220 1230 1240 1250 1260 

GGC TCC CCA GAT TAT CAA GAC TCT GAC GCC ACA ACC GAC GCC CAC ATC ATC ACC CTC GGT 
GS PDYQDSDATTDAH1 ITLG 

1270 1280 1290 1300 1310 1320 

GAA CCC GGA ACA TCT CCC AAC ACT GTC TTT GCT AGC AAT GGG TTG TAC TTT GCC CGA ACG 
E PGTSPNTVFASNGLYFART 

1330 1340 1350 1360 1370 1380 

TTT CAC ACC TCT GTT GTT CTT CCA GAC GGA AGC ACG TTT ATT ACA GGA GGC CAA CGA CGT 
FHTSVVLPDGST FITGGQRR 

1390 1400 1410 1420 1430 1440 

GGA ATT CCG TTC GAG GAT TCA ACC CCG GTA TTT ACA CCT GAG ATC TAC GTC CCT GAA CAA 
GIPFEDSTPVFTPEIYVPEQ 

1450 1460 1470 1480 1490 1500 

GAC ACT TTC TAC AAG CAG AAC CCC AAC TCC ATT GTT CGC GTC TAC CAT AGC ATT TCC CTT 
DTFYKQ-NPNSIVRVYHSISL 

1510 1520 1530 1540 1550 1560 

TTG TTA CCT GAT GGC AGG GTA TTT AAC GGT GGT GGT GGT CTT TGT GGC GAT TGT ACC ACG 
LLPDGRVFNGGGGLCGDCTT 

1570 1580 1590 1600 1610 1620 

AAT CAT TTC GAC GCG CAA ATC TTT ACG CCA AAC TAT CTT TAC AAT AGC AAC GGC AAT CTC 
NHFDAQI FTPMYLYNSNGNL 

1630 1640 1650 1660 1670 1680 

GCG ACA CGT CCC AAG ATT ACC AGA ACC TCA ACA CAG AGC GTC AAG GTC GGT GGC AGA ATT 
ATRPK I TRTSTQSVKVGGRI 

1690 1700 1710 1720 1730 1740 

ACA ATC TCG ACG GAT TCT TCG ATT AGC AAG GCG TCG TTG ATT CGC TAT GGT ACA GCG ACA 
TISTDSSISKASLIRYGTAT 

1750 1760 1770 1780 1790 1800 

CAC ACG GTT AAT ACT GAC CAG CGC CGC ATT CCC CTG ACT CTG ACA AAC AAT GGA GGA AAT 
HTVNTDQRRI PLTLTNNGGN 



1810 1820 1830 

AGC TAT TCT TTC CAA GT? CCT AGC GAC TCT GGT 
SYSFQVPSDSG 

1870 1880 1890 

TTC GTG ATG AAC TCG GCC GGT GTT CCT AGT GTG 
FVMNSAGVPSV 



1840 1850 1860 

GTT GCT TTG CCT GGC TAC TGG ATG TTG 
VALPGYWML 

1900 1910 1920 

GCT TCG ACG ATT CGC GTT ACT CAG 
AST1RVTQ 



FIG. 26C 



FIG. 27 A 



Date 

Mutant ID 
Mutation 
Sequence Size 



2000 04 . 11 
3.H7 

N413D[A1237G) , S550 (T 1650 A) , V4 94 A {T 1 4 8 1C ) 
1917 



10 20 30 40 50 60 

GCC TCA GCA CCT ATC GGA AGC GCC ATT TCT CGC AAC AAC TGG GCC GTC ACT TGC GAC AGT 
ASAPIGSAISRNNWAVTCDS 



70 80 90 

GCA CAG TCG GGA AAT GAA TGC AAC AAG GCC 
AQSGNECNKA 

130 140 150 

ACA TTC TAT GGC GCC AAC GGG GAT CCA AAG 
TFYGANGDPK 



100 110 120 

ATT GAT GGC AAC AAG GAT ACC TTT TGG CAC 
IDGNKDTFWH 

160 170 180 

CCC CCT CAC ACA TAC ACG ATT GAC ATG AAG 
PPHTYTI DMK 



190 200' 210 

ACA ACT CAG AAC GTC AAC GGC TTG TCT ATG 
TTQNVNGLSM 

250 260 270 

TGG ATC GGT CGC CAT GAG GTT TAT CTA AGC 
WIGRHEVYLS 

310 320 330 

GCG TCA GGT AGT TGG TTC GCC GAC TCT ACT 
ASGSWFADST 

370 380 390 

GCT CGC TAT GTT CGT CTT GTC GCT ATC ACT 
ARYVRLVAIT 

430 440 450 

GCA GAG ATC AAC GTC TTC CAA GCT AGT TCT 
AEINVFQASS 

490 500 510 

TGG GGT CCG ACT ATT GAC TTA CCG ATT GTT 
WGPTIDLPIV 

550 560 570 

GGA CGA GTC CTT ATG TGG TCT TCA TAT CGC 
GRVLMWSSYR 

610 620 630 

ATC ACT TTG ACG TCT TCC TGG GAT CCA TCC 
ITLTSSWDPS 

670 680 690 

GTC ACC AAG CAT GAT ATG TTC TGC CCT GGT 
VTKHDMFCPG 

730 740 750 

GTC ACA GGT GGC AAC GAT GCC AAG AAG ACC 
VTGGNDAKKT 



220 230 240 

CTG CCT CGA CAG GAT GGT AAC CAA AAC GGC 
LPRQDGNQNG 

280 290 300 

TCA GAT GGC ACA AAC TGG GGC AGC CCT GTT 
SDGTNWGSPV 

340 350 360 

ACA AAA TAC TCC AAC TTT GAA ACT CGC CCT 
TKYSNFETRP 

400 410 420 

GAA GCG AAT GGC CAG CCT TGG ACT AGC ATT 
EANGQPWTS I 

460 470 480 

TAC ACA GCC CCC CAG CCT GGT CTT GGA CGC 
YTAPQPGLGR 

520 530 540 

CCT GCG GCT GCA GCA ATT GAA CCG ACA TCG 
PAAAAIE PTS 

580 590 600 

AAT GAT GCA TTT GGA GGA TCC CCT GGT GGT 
NDAFGGS PGG 

640 650 660 

ACT GGT ATT GTT TCC GAC CGC ACT GTG ACA 
TGIVSDRTVT 

700 710 720 

ATC TCC ATG GAT GGT AAC GGT CAG ATC GTA 
ISMDGNGQIV 

760 770 780 

AGT TTG TAT GAT TCA TCT AGC GAT AGC TGG 
SLYDSSSDSW 



790 800 810 820 830 840 

ATC CCG GGA CCT GAC ATG CAA GTG GCT CGT GGG TAT CAG TCA TCA GCT ACC ATG TCA GAC 
I PGPDMQVARGYQSSATMSD 



FIG. 27B 



850 860 870 880 890 900 

GGT CGT GTT TTT ACC ATT GGA GGC TCC TGG AGC GGT GGC GTA TTT GAG AAG AAT GGC GAA 
GRVFTIGGSWSGGVFEKNGE 

910 920 930 940 950 960 

GTC TAT AGC CCA TCT TCA AAG ACA TGG ACG TCC CTA CCC AAT GCC AAG GTC AAC CCA ATG 
VYSPSSKTWTSLPNAKVNPM 

970 980 990 1000 1010 1020 

TTG ACG GCT GAC AAG CAA GGA TTG TAC CGT TCA GAC AAC CAC GCG TGG CTC TTT GGA TGG 
LTADKQGLYRSDNHAWLFGW 

1030 1040 1050 1060 1070 1080 

AAG AAG GGT TCG GTG TTC CAA GCG GGA CCT AGC ACA GCC ATG AAC TGG TAC TAT ACC AGT 
KKGSVFQAGPSTAMNWYYTS 

1090 1100 1110 1120 1130 1140 

GGA AGT GGT GAT GTG AAG TCA GCC GGA AAA CGC CAG TCT AAC CGT GGT GTA GCC CCT GAT 
GSGDVKSAGKRQSNRGVAPD 

1150 1160 U70 1180 1190 1200 

GCC ATG TGC GGA AAC GCT GTC ATG TAC GAC GCC GTT AAA GGA AAG ATC CTG ACC TTT GGC 
AMCGNAVMYDAVKGKI LTFG 

1210 1220 1230 1240 1250 1260 

GGC TCC CCA GAT TAT CAA GAC TCT GAC GCC ACA ACC GAC GCC CAC ATC ATC ACC CTC GGT 
GSPDYQDSDATTDAHI ITLG 

1270 1280 1290 1300 1310 1320 

GAA CCC GGA ACA TCT CCC AAC ACT GTC TTT GCT AGC AAT GGG TTG TAC TTT GCC CGA ACG 
EPGTSPNTVFASNGLYFART 

1330 1340 1350 1360 1370 1380 

TTT CAC ACC TCT GTT GTT CTT CCA GAC GGA AGC ACG TTT ATT ACA GGA GGC CAA CGA CGT 
FHTSVVLPDGSTFITGGQRR 

1390 1400 1410 1420 1430 1440 

GGA ATT CCG TTC GAG GAT TCA ACC CCG GTA TTT ACA CCT GAG ATC TAC GTC CCT GAA CAA 
GIPFEDSTPVFTPEIYVPEQ 

1450 1460 1470 1480 1490 1500 

GAC ACT TTC TAC AAG CAG AAC CCC AAC TCC ATT GTT CGC GCC TAC CAT AGC ATT TCC CTT 
DTFYKQNPNSIVRAYHSISL 

1510 1520 1530 1540 1550 1560 

TTG TTA CCT GAT GGC AGG GTA TTT AAC GGT GGT GGT GGT CTT TGT GGC GAT TGT ACC ACG 
LLPDGRVFNGGGGLCGDCTT 

1570 1580 1590 1600 1610 1620 

AAT CAT TTC GAC GCG CAA ATC TTT ACG CCA AAC TAT CTT TAC AAT AGC AAC GGC AAT CTC 
NHFDAQIFTPNYLYNSNGNL 

1630 1640 1650 1660 1670 1680 

GCG ACA CGT CCC AAG ATT ACC AGA ACC TCA ACA CAG AGC GTC AAG GTC GGT GGC AGA ATT 
ATRPKI TRTSTQSVKVGGRI 

1690 1700 1710 1720 1730 1740 

ACA ATC TCG ACG GAT TCT TCG ATT AGC AAG GCG TCG TTG ATT CGC TAT GGT ACA GCG ACA 
TISTDSS ISKASLIRYGTAT 



1750 1760 1770 1780 1790 1800 

CAC ACG GTT AAT ACT GAC CAG CGC CGC ATT CCC CTG ACT CTG ACA AAC AAT GGA GGA AAT 
HTVNTDQRRI PLTLTNNGGN 



1810 1820 1830 

AGC TAT TCT TTC CAA GTT CCT AGC GAC TCT 
SYSFQVPSDS 

1870 1880 1890 

TTC GTG ATG AAC TCG GCC GGT GTT CCT AGT 
FVMNSAGVPS 



1840 1850 1860 

GGT GTT GCT TTG CCT GGC TAC TGG ATG TTG 
GVALPGYWML 

1900 1910 1920 

GTG GCT TCG ACG ATT CGC GTT ACT CAG 
VASTIRVTQ 



FIG. 27C 



a 

y 

H 

ni 

fy 
m 

a 

u 
I* 
fy 
\J 

a 

13 



FIG. 28A 



Date : 2000.04.11 

Mutant ID . 4.F12 

Mutation : N4 13D (A12 37G) , S550 (T16S0A) , V4 94A (T1481C) , S610 (T18 30A) 

Sequence Size : 1917 



10 20 30 40 50 60 

GCC TCA GCA CCT ATC GGA AGC GCC ATT TCT CGC AAC AAC TGG GCC GTC ACT TGC GAC AGT 
ASAPIGSAISRNNWAVTCDS 

70 80 90 100 110 120 

GCA CAG TCG GGA AAT GAA TGC AAC AAG GCC ATT GAT GGC AAC AAG GAT ACC TTT TGG CAC 
AQSGNECNKAI DGNKDTFWH 

130 140 150 160 170 180 

ACA TTC TAT GGC GCC AAC GGG GAT CCA AAG CCC CCT CAC ACA TAC ACG ATT GAC ATG AAG 
TFYGANGDPKPPHTYTIDMK 

190 200 210 220 230 240 

ACA ACT CAG AAC GTC AAC GGC TTG TCT ATG CTG CCT CGA CAG GAT GGT AAC CAA AAC GGC 
TTQNVNGLSMLPRQDGNQNG 

250 260 270 280 290 300 

TGG ATC GGT CGC CAT GAG GTT TAT CTA AGC TCA GAT GGC ACA AAC TGG GGC AGC CCT GTT 
W I GRHEVYLSS DGTNWGSPV 

310 320 330 340 350 360 

GCG TCA GGT AGT TGG TTC GCC GAC TCT ACT ACA AAA TAC TCC AAC TTT GAA ACT CGC CCT 
ASGSWFADSTTKYSNFETRP 

370 380 390 400 410 420 

GCT CGC TAT GTT CGT CTT GTC GCT ATC ACT GAA GCG AAT GGC CAG CCT TGG ACT AGC ATT 
ARYVRLVAITEA-NGQPWTSI 

430 440 450 460 470 480 

GCA GAG ATC AAC GTC TTC CAA GCT AGT TCT TAC ACA GCC CCC CAG CCT GGT CTT GGA CGC 
AEINVFQASSYTAPQPGLGR 

490 500 510 520 530 540 

TGG GGT CCG ACT ATT GAC TTA CCG ATT GTT CCT GCG GCT GCA GCA ATT GAA CCG ACA TCG 
WGPTIDLPIVPAAAAIEPTS 

550 560 570 580 590 600 

GGA CGA GTC CTT ATG TGG TCT TCA TAT CGC AAT GAT GCA TTT GGA GGA TCC CCT GGT GGT 
GRVLMWSSYRNDAFGGSPGG 

610 620 630 640 650 660 

ATC ACT TTG ACG TCT TCC TGG GAT CCA TCC ACT GGT ATT GTT TCC GAC CGC ACT GTG ACA 
ITLTSSWDPSTGIVSDRTVT 

670 680 690 700 710 720 

GTC ACC AAG CAT GAT ATG TTC TGC CCT GGT ATC TCC ATG GAT GGT AAC GGT CAG ATC GTA 
VTKHDMFCPGI SMDGNGQIV 

730 740 750 760 770 780 

GTC ACA GGT GGC AAC GAT GCC AAG AAG ACC AGT TTG TAT GAT TCA TCT AGC GAT AGC TGG 
VTGGNDAKKTSLYDSSSDSW 

790 800 810 820 830 840 

ATC CCG GGA CCT GAC ATG CAA GTG GCT CGT GGG TAT CAG TCA TCA GCT ACC ATG TCA GAC 
IPGPDMQVARGYQSSATMSD 



FIG. 28B 



850 860 870 880 890 900 

GGT CGT GTT TTT ACC ATT GGA GGC TCC TGG AGC GGT GGC GTA TTT GAG AAG AAT GGC GAA 
GRVFTIGGSWSGGVFEKNGE 

910 920 930 940 950 960 

GTC TAT AGC CCA TCT TCA AAG ACA TGG ACG TCC CTA CCC AAT GCC AAG GTC AAC CCA ATG 
VYS PS SKTWTSLPNAKVN PM 

970 980 990 1000 1010 1020 

TTG ACG GCT GAC AAG CAA GGA TTG TAC CGT TCA GAC AAC CAC GCG TGG CTC TTT GGA TGG 
LTADKQGLYRSDNHAWLFGW 

1030 1040 1050 1060 1070 1080 

AAG AAG GGT TCG GTG TTC CAA GCG GGA CCT AGC ACA GCC ATG AAC TGG TAC TAT ACC AGT 
KKGSV FQAGPSTAMNWYYTS 

1090 1100 1110 1120 1130 1140 

GGA AGT GGT GAT GTG AAG TCA GCC GGA AAA CGC CAG TCT AAC CGT GGT GTA GCC CCT GAT 
GSGDVKSAGKRQSNRGVAPD 

1150 1160 1170 1180 1190 1200 

GCC ATG TGC GGA AAC GCT GTC ATG TAC GAC GCC GTT AAA GGA AAG ATC CTG ACC TTT GGC 
AMCGNAVMYDAVKGKI LTFG 

1210 1220 1230 1240 1250 1260 

GGC TCC CCA GAT TAT CAA GAC TCT GAC GCC ACA ACC GAC GCC CAC ATC ATC ACC CTC GGT 
GSPDYQDSDATTDAHI ITLG 

1270 1280 1290 1300 1310 1320 

GAA CCC GGA ACA TCT CCC AAC ACT GTC TTT GCT AGC AAT GGG TTG TAC TTT GCC CGA ACG 
EPGTSPNTVFASNGLYFART 

1330 1340 1350 1360 1370 1380 

TTT CAC ACC TCT GTT GTT CTT CCA GAC GGA AGC ACG TTT ATT ACA GGA GGC CAA CGA CGT 
FHTSVVLPDGSTFITGGQRR 

1390 1400 1410 1420 1430 1440 

GGA ATT CCG TTC GAG GAT TCA ACC CCG GTA TTT ACA CCT GAG ATC TAC GTC CCT GAA CAA 
GIPFEDSTPVFTPEIYVPEQ 

1450 1460 1470 1480 1490 1500 

GAC ACT TTC TAC AAG CAG AAC CCC AAC TCC ATT GTT CGC GCC TAC CAT AGC ATT TCC CTT 
DTFYKQNPNSIVRAYHSISL 

1510 1520 1530 1540 1550 1560 

TTG TTA CCT GAT GGC AGG GTA TTT AAC GGT GGT GGT GGT CTT TGT GGC GAT TGT ACC ACG 
LLPDGRVFNGGGGLCGDCTT 

1570 1580 1590 1600 1610 1620 

AAT CAT TTC GAC GCG CAA ATC TTT ACG CCA AAC TAT CTT TAC AAT AGC AAC GGC AAT CTC 
NHFDAQI FTPNYLYNSNGNL 

1630 1640 1650 1660 1670 1680 

GCG ACA CGT CCC AAG ATT ACC AGA ACC TCA ACA CAG AGC GTC AAG GTC GGT GGC AGA ATT 
ATRPKITRTSTQSVKVGGRI 

1690 1700 1710 1720 1730 1740 

ACA ATC TCG ACG GAT TCT TCG ATT AGC AAG GCG TCG TTG ATT CGC TAT GGT ACA GCG ACA 
TISTDSSI SKASLIRYGTAT 

1750 1760 1770 1780 1790 1800 

CAC ACG GTT AAT ACT GAC CAG CGC CGC ATT CCC CTG ACT CTG ACA AAC AAT GGA GGA AAT 
HTVNTDQRRI PLTLTNNGGN 



1810 1820 1830 

AGC TAT TCT TTC CAA GTT CCT AGC GAC TCA 
SYSFQVPSDS 

1870 1880 1890 

TTC GTG ATG AAC TCG GCC GGT GTT CCT AGT 
FVMNSAGVPS 



1840 1850 1860 

GGT GTT GCT TTG CCT GGC TAC TGG ATG TTG 
GVALPGYWML 

1900 1910 1920 

GTG GCT TCG ACG ATT CGC GTT ACT CAG 
VASTIRVTQ 



FIG. 28C 
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connected therewith: Gordon D. Coplein #19,1 65, William F. Dudine, Jr. #20,569, Michael J. Sweedler #19,937, S. Peter Lud wig #25,351, Paul Fields #20,298, Marc S. Gross 
#1 9,614, Joseph B. Lerch #26,936, Melvin C. Garner #26,272, Ethan Horwitz #27,646, Beverly B. Goodwin #28,417, Adda C. Gogoris #29,714, Martin E. Goldstein #20,869, 
Bert J. Lewen #19,407, Henry Sternberg #22,408, Peter C. Schechter #31,662, Robert Schaffer #31,194, Robert C. Sullivan, Jr. #30,499, Ira J. Levy #35,587, Joseph R. 
Robinson #33,448, Scott G. Lindvall #40,325 

all of the firm of DARBY & DARBY P.C., 805 Third Avenue, New York, NY 10022 
SEND CORRESPONDENCE TO: DIRECT TELEPHONE CALLS TO: 



DARBY & DARBY P.C. 
805 Third Avenue 
New York, NY 10022 



Robert Schaffer 



212-527-7700 



FULL NAME AND RESIDENCE OF INVENTOR 1 
LAST NAME: ARNOLD FIRST NAME: Frances MIDDLE NAME: K. 
CITY: Pasadena STATE OR FOREIGN COUNTRY: California 

POST OFFICE ADDRESS: 629 S. Grand Avenue CITY: Pasadena 



COUNTRY OF CITIZENSHIP: USA 

STATE OR COUNTRY: California ZIP CODE: 91105 



FULL NAME AND RESIDENCE OF INVENTOR 2 

LAST NAME: PETROUNIA FIRST NAME: loanna MIDDLE NAME: P. 

CITY: Pasadena STATE OR FOREIGN COUNTRY: California COUNTRY OF CITIZENSHIP: USA 

POST OFFICE ADDRESS: 385 S. Catalina Avenue, Apt. 206 CITY: Pasadena STATE OR COUNTRY: California 



ZIP CODE: 91106 



FULL NAME AND RESIDENCE OF INVENTOR 3 

LAST NAME: SUN FIRST NAME: Lianhong MIDDLE NAME: 

CITY: Pasadena STATE OR FOREIGN COUNTRY: California COUNTRY OF CITIZENSHIP: USA 

POST OFFICE ADDRESS: 307 S. Wilson Avenue, Apt. 1 CITY: Pasadena STATE OR COUNTRY: California ZIP CODE: 91106 



I further declare that all statements made herein of my own knowledge are true and 
that all statements made on information and belief are believed to be true; and further 
that these statements were made with the knowledge that willful false statements 
and the like so made are punishable by fine or imprisonment, or both, under Section 
1001 of Title 18 of the United States Code, and that such willful false statements 
may jeopardize the validity of the application or any patent issuing thereon. 



SIGNATURE OF INVENTOR 1: DATED: 



SIGNATURE OF INVENTOR 2: DATED: 



SIGNATURE OF INVENTOR 3: 



DATED: 



